Statistical software… Excel?
Hey pals, its Tim again — we made it through all of our CTRSC staff and have come full circle with the blog. Now I actually have to have a real post. Here we go!
The answer to the subject line…
Yes. If you are in the old school group that thinks Excel is for kids, it is time to rethink your stance. Excel has improved considerably over the years and should now be considered quite useful for basic statistical methods. Of course, I’m not recommending Excel for anything fancy (e.g. regression) unless you are talking about using an add-on package. These methods are a bit more complicated than Excel can typically handle. Even if you can get results pumped out for a regression in the base Excel software, you will be extremely limited in your model diagnosis.
You might be thinking: so you are saying we can use Excel, but we shouldn’t use Excel? Well, yeah… pretty much. For basic statistics, Excel is an excellent choice since it is really easy to use. This ease of use comes with a caveat though – you have to be relatively well versed in Excel formulas to get what you need out of it. Because of these issues, I created an Excel workbook that helps investigators do a number of basic statistical tests without the need to type in their own formulas. In this post, I will go through each tab in this workbook. A link to the workbook will also be posted. I am always looking for new things to add to the file, so if you find something useful that is not currently available, send me an email and I will see about adding it in the next revision.
Tab 1: Cover
Tab 1 is just the cover page. It keeps the version and date of last revision and my contact information, as well as a note that you should not use this for your own monetary benefit. If you do, I’ll track you down and steal something from you to sell.
Tab 2: Sample size continuous
Tab 2 has a sample size calculator for superiority studies where your outcome variable is a continuous value and you have two study groups. You can choose between a level of significance of 0.01 and 0.05 (nothing else yet!), as well as a power of 90% and 80% (nothing else yet!). Be careful to enter the data as indicated (decimal format, not percent!). Next you enter your mean expected value for the outcome in your two study groups. Finally, you enter in the standard deviation for the outcome. At the moment, the assumption for this file is that the standard deviation is the same for both groups. Hopefully I’ll be able to add both standard deviations in a future release. You can also enter in an expected drop-out rate if you are so inclined. It will calculate the sample size for each option, as well as adding a 5% and 10% continuity corrected sample size.
Tab 3: Sample size categorical
Tab 3 is the sample size for superiority studies where your outcome variable is a categorical variable (dichotomous for now) and you have two study groups. You enter the data just as you do for tab 2, but you also get the option to change the ratio of patients in each of your two study groups. Be careful to read the instructions so you do the ratio correctly! The default is 50% in each group.
Tab 4: 2×2 table
Tab 4 allows you to calculate chi squared statistics for a 2×2 table, as well as odds and risk ratios with confidence intervals, and the number needed to treat (NNT)/number needed to harm (NNH). You enter your data in the blue cells and everything is auto calculated. You can also examine the table of ‘expected values’ to see if the data meet the assumptions of the chi-squared test. If they do not, you will need to use Fishers Exact, which is not currently available. The NNT/NNH will be shown as appropriate (e.g. if there is a protective effect of your main variable, the NNT will be shown and the NNH will be blank). Be sure to input your data as shown. The positive risk factor (exposure) goes on the top row, non-exposure on the bottom row, outcome positive on the left column, and ouotocme negative on the right column.
Tab 5: 2×2 totals
This tab is the same as the prior tab, except you don’t need the same four data points. Here, you can calculate the same data if you have the total with and without the outcome and the total with and without the outcome with the exposure. Sometimes this is just easier than finding the data for tab 4.
Tab 5: Compare 2 means
This tab compares the mean value of two variables using a equal variances, independent samples Student’s t-test. You only need the sample size in your two groups, as well and the mean and standard deviation in both groups.
Tab 6: Compare 2 rates
This tab compares two incidence rates using a Chi-squared test. You need the total numerator and person-time for your two groups.
Tab 7: Cleaning efficiency (ATP)
This tab is for the infection preventionists, environmental services professionals, and healthcare epidemiologists. It essentially calculates a percent difference between two ATP (adenosine triphosphate) readings when examining the cleanliness of a patient’s room. It can be used for anything though really.
Tab 8: Percent difference
This is similar to the ATP tab, but more generic. It calculates a percent difference between two values. If there is an increase, you will see data in the percent increase, and in the percent decrease if there is a decrease.
Tab 9: 2×2 confounding
This tab is slightly more complicated. It tells you if you have a confounding variable (one at a time!) and will adjust for it if needed – for when your outcome, predictor, and confounding variables are all dichotomous. You start by entering your data just as you did in tab 4. After this, you must stratify the data in the first 2×2 table by the two levels of your potential confounding variable. After that, the rest is automatic. You will see (in words) if confounding is present based on if the confounding variable is related to both the predictor and outcome and if the crude risk or odds is ≥10% different than the adjusted risk or odds (Mantel-Haenszel method).
Tab 10: 3×2 table
Now we are getting fancy. Not really. This tab is the same as the 2×2 table except when you have a predictor variable with three levels instead of two. Enter the data as you did in the 2×2 table tab.
Tab 11: 95% CI for percentage
This tab creates a binomial confidence interval for a percentage. Sometimes this is useful.
Tab 12: OR to RR
This tab will adjust an odds ratio to reflect the true risk ratio. This is used when you have a cohort study, but still calculate an odds ratio (e.g. you used logistic regression to adjust for confounding instead of a more appropriate method that gives you the risk – remember logistic only gives you odds, but in a cohort study, you should calculate the risk!). Here you need the odds ratio, and the percent of patients with the outcome in the group without the predictor (unexposed group). It will adjust the odds to reflect the risk based on the formula provided by Zhang and colleagues in JAMA (reference provided in the file).
Tab 13: Diagnostic tests
This tab has gotten more and more complicated over the years. If you want to calculate the diagnostic accuracy of a new test in reference to a gold standard, this is for you. Input your data in the 2×2 table at the top and it will calculate sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio, and the diagnostic odds ratio. It will also provide binomial confidence intervals (95%) for all but the likelihood ratios. After this, you can calculate the post test odds/probability using two different methods: if you know the prevalence, or if you have the prevalence and likelihood ratios from the literature.
Tab 14: Compare diagnostic tests
This tab uses McNemar’s method for calculating P-values for equality of sensitivities and specificities of two different diagnostic tests.
Tab 15: Reliability
This tab calculates the Kappa statistic for the percent agreement between two things. Typically this is the agreement between two people.
Tab 16: Control chart overview
Control charts rule. This tab gives an overview of the final three tabs: g charts, u charts, and p charts. We use these a lot in healthcare epidemiology and infection prevention for outbreak detection. They can be used for anything though and are always better than plain old line charts when you have between 25 and 50 data points.
Tab 17: Rules and Abbreviations
This tab gives the rules for detection of special cause variation on the control charts (Montgomery Rules)
Tab 18: g chart
The g chart is used to plot the time between events (similar to a t chart… actually the g chart is used for plotting the number of events between events, but who is counting). For example, it can be used to plot the number of days between surgical site infections. It is awesome when you have rare events and a regular chart really makes no sense. You just input the date of the event and it does the rest.
Tab 19: u chart
This chart is used when you are plotting data that follow a Poisson distribution. Typically this means that you are counting events (e.g. infections) when the numerator can happen more than once to a patient in the time period. In healthcare epidemiology, we use this chart for device-associated infections, when one patient can get multiple infections per time period. Just input the month (or quarter, year, etc), number of infections in that time period, and the number of ‘denominators’ in that same time period (central line days, ventilator days, etc.)
Tab 20: p chart
This is similar to the u chart but should be used when the data fit a binomial distribution. Typically we use these when the numerator is only counted once per patient in a time period. In healthcare epidemiology, we use them for microbiological surveillance as well as for compliance with hand hygiene and other compliance measures. You input the data the same way as for the u chart.
Items on the list for the next revision
On the next revision (version 1.3), I will be adding a sample size calculator for non-inferiority studies, as well as a page for non-inferiority analysis using the confidence interval and non-inferiority margin. Hopefully I’ll be able to add the same for equivalency studies too. I will also be adding some tweaks for the rest of the tabs. I also hope to add a post hoc power calculator given sample size, level of significance, etc. Finally, I’m going to clarify the comparison of diagnostic tests (or remove it because it is very confusing), and change the g chart to a t-chart.
That’s it! Enjoy!
BTW: Here is the file: Statistical Tests_Current