Effect of daily chlorhexidine bathing on hospital-acquired infection by Climo et.al NEJM 2013, 368(6); 533-542.

Boy, this is a terribly done study. Aside from the clear issues with scientific writing (objective doesn’t match methods, which do not match results, and vice versa), There are violently terrible choices in the statistical methodology, which is what I will focus on in this post.

- There is no sample size calculation/power analysis. This is critical for cluster randomized crossover studies so we can interpret the results. They even mention something later in the manuscript about how the data are likely to have occurred because there weren’t many cases of VRE or MRSA. Give me a break – this is an elementary error – reviewers and editor should have put the stop to that.
- There is no washout period. Why????? This is critical for a “bioburden reduction” intervention evaluating infection as the outcome. There was absolutely the risk of contamination effects from the treatment arms into the control arm.
- Sites were dropped due to “non compliance”, but assessment and calculation of non-compliance was not documented. We have no idea how valid the data are without these cases.
- It appears due to the way the methods are written that SAGE provided education for the treatment arm along with the product. If equivalent education was not provided to the control arm, we have no idea if any of the results are due to the treatment or this extra intervention.
- Where are the patient characteristics? There are cluster level statistics but not patient level. This is critical to help interpret the results and try to generalize.
- The biggest issue is here: they did not account for clustering and crossover in their statistical analysis – this makes the results completely invalid and useless. This is not a new design – there are many studies documenting the appropriate approach to analysis of these studies – why did they not account for any clustering and non-independence?? This makes absolutely no sense at all.
- Beyond the inappropriate analysis not accounting for clustering/crossover, I think their models are also inappropriate. There is no discussion of assessing the critically important assumption of the poisson model: mean=variance for the dependent variable (unlikely to have been met) – and the description of variables used for adjustment is weak – how were these selected? A hierarchical model accounting for cluster-level and patient level variables would have been much more appropriate – at least correct for the assumption violation if indeed it was violated.
- Kaplan-Meier curves were presented but not discussed in methods – The outcome isn’t mortality, so how did they account for competing risks? It appears they didn’t.
- Cox Regression is inappropriate as used here – first, they didn’t account for clustering and didn’t describe model building procedures. The KM curves cross as well – this along with the statement identifying patients with a longer length of stay to have a different hazard of the outcomes makes me think the proportional hazards assumption was violated. If so, this model is not correctly used. They could have used some time dependent covariates, but I would have rather seen a parametric accelerated failure model – but again, they need to account for intraclass clustering.
- No limitations section? Give me a break.
- They indicated a couple of times that there were certain percent differences in the study that were statistically significant with P-values of 0.05. This is interesting. First, confidence intervals around the rates would have been MUCH more useful than the P-value so we can assess true differences and not the weak p-value. Second, they do not state their cutoff of a “significant” P-value. Usually it is <0.05, making this actually not significant. Even if they accept 0.05 as significant (P≤0.05 is significant, which is ok), they should have provided the third decimal place. In fact, they do provide a P-value of 0.007 later on (to the third decimal place). If they were only providing two significant digits, this should have been reported as 0.01. The fact that it is three decimal places makes me wonder if they were being less than ethical and rounding down. For example, this P-value could have been 0.049 or 0.051. Either would be rounded to 0.05 but only one would actually be significant.

Based on the issues, the conclusions are way over blown. This study doesn’t prove anything, except they should have had a stronger statistician evaluate or at least review the protocol before attempting to publish.

There are many, many more issues including clinical issues here (timing of catheter, catheter placement, etc), but I don’t have all day to complain.

Overall, this study puts the last nail in the coffin showing that the peer review system is highly flawed, even for “the best medical journal” the NEJM. We need much stronger oversight for scientific publication. The difficulty of publishing negative studies, the influence of industry, and the clear lack of attention to methods really makes me depressed.

]]>

The ingenious duders over at Twitter recently released their BreakoutDetection tool (https://blog.twitter.com/2014/breakout-detection-in-the-wild) to evaluate anomalies and mean shifts in large time-series datasets. The first thing that came to my mind was: will this work for outbreak detection? Of course, it should – I regularly use statistical process control charts to monitor disease incidence (particularly of healthcare-associated infections) but I have never been that sold on SPC alone. Furthermore, if this works appropriately, it can be applied more broadly to cloud-based surveillance systems to more accurately detect outbreaks of anything (Ebola anyone — NHSN data????, etc.).

Although today is my first day back in the office from a six week paternity leave, I had to try this out on the CDC historical flu data to compare to their age old time series plot with their “epidemic threshold” confidence bands (my brain is only partially working from lack of sleep and lack of R use for a while).

Let us install the R package first:

install.packages("devtools") devtools::install_github("twitter/BreakoutDetection") library(BreakoutDetection)

Next, get the CDC data from here (I rename it data, because I like to repurpose all of my code with generic datasets, particularly for these test cases): View Chart Data(http://www.cdc.gov/flu/weekly/weeklyarchives2014-2015/data/NCHSData.csv)

Read that business into R:

data<-read.csv("/Users/timothywiemken/Desktop/data.csv"

Run the breakout function and view the plot. Note I’m using a very small min.size just for the heck of it. Also, I’m using a 2% increase for anomaly detection. This is pretty small – I have tried a bunch of other values and get reasonably similar results (a few more detections with the small percent – mostly in the first couple of years – which I think is more useful to determine what is happening with more precision).

res = breakout(data$Percent.of.Deaths.Due.to.P.I, min.size=4, method='multi', percent=0.02, degree=1, plot=TRUE) res$plot

The plot is nice – and the Twitter folks have a much cooler plot on their blog post. Unfortunately they didn’t provide the code for the fancier plot and I don’t have the time right now to recreate it. I also don’t use ggplot a ton (the plot is default ggplot), so I gave up after a couple of seconds trying to get the x tick labels to show up as the week/year. Sue me (please don’t).

Either way, I think this is pretty useful and appears to be accurate (sorry for the poor quality figures, wordpress seems to destroy my image compression).

Keep up the good work pals.

CDC FluView Plot

Same data, using Twitter BreakoutDetection Algorithm:

Obviously my plot can be made much fancier, but I’m busy enjoying my new daughter Indigo!

Take care pals – and wash your hands. When you think you have washed them long enough, add 10 more seconds.

*** 10/28/2014 UPDATE

I was able to relabel the x axis. ggplot is pretty decent. Good job Hadley.

code (can probably be simplified):

# create an object of the plot so i dont have to use $ stuff<-res$plot # create the x labels data$wkyr<-paste(data$Week, data$Year, sep="-") # get every 10th observation and put it into a new vector of just the week/year labels for the plot sub1<- data[seq(1, nrow(data), by=10), ] wkyr2<-sub1$wkyr # replot library(ggplot2) stuff + labs(y="Percent of All Deaths Due to \nPneumonia and Influenza", x="Week-Year") + scale_x_continuous(breaks = c(seq(from = 1, to = 261, by = 10)), labels = wkyr2) + theme(axis.text.x = element_text(angle = 45, hjust = 1))

(NOTE: there is an error in there about already having an x axis scale, but it still works. My guess is there is a way to combine the scale_x_continuous and the theme to get the labels in the right place and rotated)

PEACE!

Tim

Timothy Wiemken PhD MPH CIC

Assistant Professor of Medicine

Assistant Director of Epidemiology and Biostatistics

University of Louisville School of Medicine

Division of Infectious Diseases

Clinical and Translational Research Support Unit

501 E. Broadway, #120B (not for much longer – moving down the hall on Monday Nov 3)

Louisville, KY 40202

Assistant Professor of Medicine

Assistant Director of Epidemiology and Biostatistics

University of Louisville School of Medicine

Division of Infectious Diseases

Clinical and Translational Research Support Unit

501 E. Broadway, #120B (not for much longer – moving down the hall on Monday Nov 3)

Louisville, KY 40202

@timwiemken

tlwiem01@louisville.edu

]]>

As we have been building this system, I have been thinking about how we could use interactive visualization in our research to help improve understanding of our research when we publish. The current paradigm for publication means paper copy of an article or a PDF of a paper copy of an article. The problem with this is that we are ignoring a much richer environment for publishing than papers provide. In particular with visualization there are so many techniques that could be used to make data come alive for the reader that could enhance our ability to make sense of it. Some really good examples of interactive data visualizations can be found at the D3 website (http://d3js.org/). D3 is a Javascript library for writing visualizations and the website has a portfolio so you can see several of the options.

The problem is, many of these visualizations highlighted on the D3 website are unfamiliar in the biomedical academic world. Traditionally this discipline uses a fairly limited set of visualizations (e.g. bar charts, line charts, boxplots, etc.). So I suggest a paradigm shift in which biomedical researchers explore using more complicated visualizations that support interaction in their research and ALL publishers provide policies and infrastructures to support these visualizations. Naturally, over time a standard will need to develop as to the language and approach for interactive visualization so publishers do not have to support an excessive number of tools on their infrastructure. But, once that can happen, our research can be much more interactive and perhaps more illustrative to our readers.

Rob

]]>The journal *Science *recently published an article titled “Who’s afraid of peer review?” which centers on the issues related to the open access publication system. This post will focus on the review process and the results of the *Science* article.

Anyone who conducts research knows how the peer review process works. Essentially it is a pain. For research to be meaningful, it must be disseminated. Unfortunately the only acceptable dissemination of research in academia and industry is through peer-review publication. Without dissemination, one cannot get grants and will likely not be promoted. The downfall of this system is that there really are not that many journals. This means that important research may not make it to publication just because there is nowhere to publish it. All journals are inundated with research. The editor screens the article and either rejects itimmediately, based on his or her view of the merits and fit with the journal, or sends it out to peer review.

The peer-review process is wrought with troubles. One unfortunate issue with the peer review process is that it really is often reviewed by your peers – I mean your best friends. When you submit an article, most journals require the names of reviewers. This is because it is difficult to find people who are willing to spend their time to review the research for free. However, when reviewers are suggested, it can be anyone – including friends who will look favorably on the work – after all, why would anyone suggest people who would not like their paper? Why could this be an issue you ask? Well, most journals do not have a blinded review, meaning that the reviewers know who wrote the article. If they are a big name or a friend, human biases suggest the review would be more favorable. Whether this truly is or isn’t the case has not been evaluated.

Regardless, to help alleviate these issues, an open access publication stream has recently exploded into the research world. From the surface, open access is an excellent idea. First, it adds thousands of “journals” to the landscape, which could serve to limit the issue described above. Second, it allows “published” articles to be available free of charge. This helps investigators who do not have access to an academic library where most of the journals are readily available. Unfortunately, the system has come with a price. A steep price, in fact. Since traditional journals owned by a publishing company make money through advertisements during publication, open access journals miss out on this income stream. Therefore, they charge anywhere from $50 to several thousand dollars to review and/or “publish” an article (the word publish is in quotes because these journals are online and not truly published on paper as with a traditional journal). These costs are often impossible to pay for any young investigator or an investigator from a developing nation. However, the journal needs this income stream to properly edit and typeset the manuscript.

Unfortunately, the open access system has been nearly ruined by a number of bogus “journals” who are out to capitalize on the necessity of getting research into publication. The *Science *article outlines this perfectly. Essentially, the author under multiple pseudonyms, made up data and submitted clearly flawed research to 304 open access journals. At the end of the day, over half accepted the manuscript without a clear peer review of the science in the article — even those considered “good”.

Open access is a necessity in today’s research climate. Unfortunately, it has become more of a pyramid scheme of shady publishers exploiting researchers who are out to disseminate their research findings. Hopefully this can be remedied, but significant damage has already been done.

Bohannon, J. Who’s Afraid of Peer Review? 4 October 2013, 342: 60-65.

]]>In my last post I laid out how to construct a research question. Once you have a clear study question the next step is figuring out how you are going to answer that question. For this you need to determine the appropriate study design for your project. In this post I’ll go through several categories of study design and examples of appropriate designs for certain types of studies.

There are two basic categories of study design, observational and interventional. Observational studies are exactly what you would suspect. The investigators observe what is naturally occurring in a system. The investigator tries to interfere with the system as little as possible.

There are several types of observational studies. Cohort studies are a types of observational study that follows a group over time either prospectively or retrospectively. They often describe the incidence or natural history of a condition, commonly a chronic disease. They analyze predictors or risk factors for various outcomes. Retrospective cohort studies are good for investigating causes of conditions and defining their incidence. They are generally much less costly in terms of time and money. Investigators are limited in retrospective studies by the quality of variables they can collect. You can only analyze the data that has been collected. If it wasn’t collected, you can’t go back and get it so if you’re looking a variable that is rarely collected you may want to do a prospective study. Prospective Cohort studies are also good for defining incidence and risk factors. Investigators also have the advantage of collecting variable exactly how they want them to be collected. Unfortunately, prospective cohort studies are a very inefficient method of studying rare outcomes. Ten years of prospective data requires ten years to collect. Ten years of retrospective data can be collected in a fraction of the time, saving time and money.

A second type of observational study is the cross-sectional study. In a cross-sectional study the investigator makes all measurements at a single moment in time for each subject. This is good to investigate prevalence of a variable at a point in time but you obviously can’t track change in variables over time.

A third type of observational study is the case-control study design. This design classically has two groups based on outcome. Group 1 has outcome A while group 2 has outcome B. This design gives a high yield of information from relatively few subjects. This model can be an efficient design for investigation of diseases with long latent periods, but can’t measure incidence or prevalence.

The second category of study design is the interventional study. These studies insert the investigator into the system being analyzed. The investigator generally intervenes in one group of subjects while not intervening in the other. Interventional studies are subcategorized by method of segregating the groups. Subjects can be segregated randomly, generally by a computer program, or non-randomly as in subject self-selection when a subject volunteers for the intervention group. Randomized trails are preferred because randomization controls for confounding variables that the investigator is unaware of or is not collecting.

Interventional studies are also classified by who knows which subjects receive the intervention in question. If neither the subject nor the investigator measuring the variable are aware of the subjects grouping then the study is double blinded. If the subject is unaware of their grouping but the investigator measuring the variable is aware of the subjects group or vice versa, the study is single blinded. Pharmacists, technicians, physicians etc. can all be blinded to prevent bias from being introduced into study results. The “gold standard” of interventional studies is the randomized double blind trial, but these tend to be very costly and complicated.

In the end the best study design is the one that answers your study question, because performing a study only to be unable to answer your research question is a waste of both your time and money which are scarce resources for most researchers.

In the next post I’ll outline the creation of your study protocol. As always if you need assistance or advice on any part of the research process in the mean time, don’t hesitate to contact us here at the center. My contact information is below. We’d be glad to help.

Daniel Curran MD

Clinical Research Fellow

Clinical and Translational Research Support Center

University of Louisville Division of Infectious Diseases

daniel.curran@louisville.edu

Office: 502-852-0683

**1. Why do you want to have a new website?**

This is an obvious yet critical question to ask. Do you want to focus primarily on promoting your organization? Provide information? Sell products? Provide a service? A combination of these? The focus of your website will affect everything about it, from design to content.

**2. Describe your organization in a few sentences.**

This information is useful for creating a home page as well as giving a user who happens to stumble upon your website a general idea of where they are.

**3. Describe the typical user who will view your website.**

Age, gender, conditions, and user expectations all play important roles in the development process. Will your users be tech savvy and highly adaptable? Or will they look at a computer and cry?

**4.** **Who is in charge of providing website content?**

Will you decide as a group or will one person be responsible? It is important to decide who is responsible for being the spokesperson to the developer so that conflicting inputs are not a and the developer knows who to contact. Time is wasted and confusion is generated when a developer is contacted by different people providing different answers to the same problem.

**5. What sets your organization apart from the rest of the herd?**

What is special about your company? Do you provide a specific service that is rarely found elsewhere? Answer with something that gives your organization value to the company.

]]>I’m sure my title caught your attention, or at least made you stop and think “okay, who does this EBP think she is???”. But don’t be haters because I am going to talk about the meaning of life. In any case, I will move things along here and just say that the answer is “42”………….if you are still scratching your head, go to the nearest bookstore NOW and pick up your own copy of The Hitchhiker’s Guide to the Galaxy. It will be absolutely worth it, especially if you are convinced mice are a lot smarter than we think and you like depressed robots…….don’t question what I just wrote….just get the book.

Now that I’ve sucked you into my blog, I can now talk to you about my real “meaning of life” post: **REGRESSION MODELS!!!!!** woohoooo!!! I will first start out with simple terms, and next blog I will plan on continuing my talk on regression models building off of what was learned here.

So what is regression exactly?? Regression analysis is used to estimate the conditional expectation of a dependent variable (any outcome such as disease, no disease, death, no death, etc.) given the independent variables (such as gender, age, smoker, non-smoker, etc.). Depending on how many independent variables are involved in your study, you can have either a **Simple Linear Regression** or a **Multivariable Linear Regression**. These regression models do not only consist of linear terms, however. There are many other regression models to be aware of and which one to use will be dependent on the nature of your outcome or independent variable:

> when outcome variable is continuous- perform Linear Regression

>when outcome variable is categorical – perform Logistic Regression

>when outcome variable is levels/counts-perform Poisson Regression

>when outcome variable is time to event (aka time to disease or death)-perform Cox Regression

>in addition, when encompassing both continuous and categorical predictors/outcomes, GLM or a Generalized Linear Model (an extension of simple linear regression) should be performed

Understanding what your variables are is extremely important in order to choose the correct model to perform your analysis. For example, let’s look at choosing logistic regression as a model for an analysis. I would first have to verify that my outcome variable is categorical. I would also have to remind myself that performing this regression model will present me with odds ratios in the occurrence of the categorical outcome variable. Logistic regression can also be verified as the correct model for my analysis after identifying what type of study is being performed. Case-control studies, which are good for rare diseases, are used for calculating odds ratios (as opposed to cohort studies that calculate relative risks). Thus, if I know my study was a case-control study, then I should automatically know that logistic regression would be the correct model for this particular analysis.

Further, use of the particular regression model you use cannot stop here. There are many more factors to take into account to make sure your model runs smoothly, such as adjusting for confounding factors and effect modifiers to name a few. Next time around, I will visit these concepts and continue to expand on the complexity of “the meaning of life”…regression models.

]]>Today I wanted to write about processing data analysis requests. For a small organization with maybe one analyst/statistician a formal system for handling requests may not be necessary. However, as the organization grows, multiple investigators with multiple lines of research may need several different analyses from several different data sets. Managing these requests can be time-consuming and problematic. The risk is that an analysis is not done in time for a deadline or other reason. Therefore, your analysis team should develop a consistent process/workflow that is understood both the team and the investigators requesting analyses.

An example of a workflow might be:

- The investigator submits a written request for analysis that includes all of the pertinent information required to complete the analysis.
- The analysis team approves or rejects the request based on its content. Rejection may be because the request is incomplete or additional items need to be considered.
- The analysis team conducts the analysis and generates relevant statistics, output, and figures (essentially the method and results sections of a publication).
- The investigator receives the output of the analysis which is then used to write a manuscript for publication.

There may be other elements of this process – for instance some organizations may require a senior member of the investigator team to approve all analysis requests in addition to the approval process by the analysis team.

This process can easily be implemented on paper using printed request forms but could also be implemented via a web-application. In fact, for the CTRSC, this is exactly what we have done with our Investigator Research Tool (IRT). The goal of the IRT is to provide an electronic method for making an analysis request and receiving the output. The advantage of using this system over paper is that we will be able to more easily collect statistics on the requests we have received. This will allow us to identify bottlenecks in our process and improve efficiency.

We have just rolled out the IRT, so I don’t have statistics yet. In the spring I will write again about how things have fared with our system and what problems we encountered.

Until next time . . .

Rob

]]>The answer to the subject line…

Yes. If you are in the old school group that thinks Excel is for kids, it is time to rethink your stance. Excel has improved considerably over the years and should now be considered quite useful for basic statistical methods. Of course, I’m not recommending Excel for anything fancy (e.g. regression) unless you are talking about using an add-on package. These methods are a bit more complicated than Excel can typically handle. Even if you can get results pumped out for a regression in the base Excel software, you will be extremely limited in your model diagnosis.

You might be thinking: so you are saying we can use Excel, but we shouldn’t use Excel? Well, yeah… pretty much. For basic statistics, Excel is an excellent choice since it is really easy to use. This ease of use comes with a caveat though – you have to be relatively well versed in Excel formulas to get what you need out of it. Because of these issues, I created an Excel workbook that helps investigators do a number of basic statistical tests without the need to type in their own formulas. In this post, I will go through each tab in this workbook. A link to the workbook will also be posted. I am always looking for new things to add to the file, so if you find something useful that is not currently available, send me an email and I will see about adding it in the next revision.

*Tab 1: Cover*

Tab 1 is just the cover page. It keeps the version and date of last revision and my contact information, as well as a note that you should not use this for your own monetary benefit. If you do, I’ll track you down and steal something from you to sell.

*Tab 2: Sample size continuous*

Tab 2 has a sample size calculator for superiority studies where your outcome variable is a continuous value and you have two study groups. You can choose between a level of significance of 0.01 and 0.05 (nothing else yet!), as well as a power of 90% and 80% (nothing else yet!). Be careful to enter the data as indicated (decimal format, not percent!). Next you enter your mean expected value for the outcome in your two study groups. Finally, you enter in the standard deviation for the outcome. At the moment, the assumption for this file is that the standard deviation is the same for both groups. Hopefully I’ll be able to add both standard deviations in a future release. You can also enter in an expected drop-out rate if you are so inclined. It will calculate the sample size for each option, as well as adding a 5% and 10% continuity corrected sample size.

*Tab 3: Sample size categorical*

Tab 3 is the sample size for superiority studies where your outcome variable is a categorical variable (dichotomous for now) and you have two study groups. You enter the data just as you do for tab 2, but you also get the option to change the ratio of patients in each of your two study groups. Be careful to read the instructions so you do the ratio correctly! The default is 50% in each group.

*Tab 4: 2×2 table*

Tab 4 allows you to calculate chi squared statistics for a 2×2 table, as well as odds and risk ratios with confidence intervals, and the number needed to treat (NNT)/number needed to harm (NNH). You enter your data in the blue cells and everything is auto calculated. You can also examine the table of ‘expected values’ to see if the data meet the assumptions of the chi-squared test. If they do not, you will need to use Fishers Exact, which is not currently available. The NNT/NNH will be shown as appropriate (e.g. if there is a protective effect of your main variable, the NNT will be shown and the NNH will be blank). Be sure to input your data as shown. The positive risk factor (exposure) goes on the top row, non-exposure on the bottom row, outcome positive on the left column, and ouotocme negative on the right column.

*Tab 5: 2×2 totals*

This tab is the same as the prior tab, except you don’t need the same four data points. Here, you can calculate the same data if you have the total with and without the outcome and the total with and without the outcome with the exposure. Sometimes this is just easier than finding the data for tab 4.

*Tab 5: Compare 2 means*

This tab compares the mean value of two variables using a equal variances, independent samples Student’s t-test. You only need the sample size in your two groups, as well and the mean and standard deviation in both groups.

*Tab 6: Compare 2 rates*

This tab compares two incidence rates using a Chi-squared test. You need the total numerator and person-time for your two groups.

*Tab 7: Cleaning efficiency (ATP)*

This tab is for the infection preventionists, environmental services professionals, and healthcare epidemiologists. It essentially calculates a percent difference between two ATP (adenosine triphosphate) readings when examining the cleanliness of a patient’s room. It can be used for anything though really.

*Tab 8: Percent difference*

This is similar to the ATP tab, but more generic. It calculates a percent difference between two values. If there is an increase, you will see data in the percent increase, and in the percent decrease if there is a decrease.

*Tab 9: 2×2 confounding*

This tab is slightly more complicated. It tells you if you have a confounding variable (one at a time!) and will adjust for it if needed – for when your outcome, predictor, and confounding variables are all dichotomous. You start by entering your data just as you did in tab 4. After this, you must stratify the data in the first 2×2 table by the two levels of your potential confounding variable. After that, the rest is automatic. You will see (in words) if confounding is present based on if the confounding variable is related to both the predictor and outcome and if the crude risk or odds is ≥10% different than the adjusted risk or odds (Mantel-Haenszel method).

*Tab 10: 3×2 table*

Now we are getting fancy. Not really. This tab is the same as the 2×2 table except when you have a predictor variable with three levels instead of two. Enter the data as you did in the 2×2 table tab.

*Tab 11: 95% CI for percentage*

This tab creates a binomial confidence interval for a percentage. Sometimes this is useful.

*Tab 12: OR to RR*

This tab will adjust an odds ratio to reflect the true risk ratio. This is used when you have a cohort study, but still calculate an odds ratio (e.g. you used logistic regression to adjust for confounding instead of a more appropriate method that gives you the risk – remember logistic only gives you odds, but in a cohort study, you should calculate the risk!). Here you need the odds ratio, and the percent of patients with the outcome in the group without the predictor (unexposed group). It will adjust the odds to reflect the risk based on the formula provided by Zhang and colleagues in JAMA (reference provided in the file).

*Tab 13: Diagnostic tests*

This tab has gotten more and more complicated over the years. If you want to calculate the diagnostic accuracy of a new test in reference to a gold standard, this is for you. Input your data in the 2×2 table at the top and it will calculate sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio, and the diagnostic odds ratio. It will also provide binomial confidence intervals (95%) for all but the likelihood ratios. After this, you can calculate the post test odds/probability using two different methods: if you know the prevalence, or if you have the prevalence and likelihood ratios from the literature.

*Tab 14: Compare diagnostic tests*

This tab uses McNemar’s method for calculating P-values for equality of sensitivities and specificities of two different diagnostic tests.

*Tab 15: Reliability*

This tab calculates the Kappa statistic for the percent agreement between two things. Typically this is the agreement between two people.

*Tab 16: Control chart overview*

Control charts rule. This tab gives an overview of the final three tabs: g charts, u charts, and p charts. We use these a lot in healthcare epidemiology and infection prevention for outbreak detection. They can be used for anything though and are always better than plain old line charts when you have between 25 and 50 data points.

*Tab 17: Rules and Abbreviations*

This tab gives the rules for detection of special cause variation on the control charts (Montgomery Rules)

*Tab 18: g chart*

The g chart is used to plot the time between events (similar to a t chart… actually the g chart is used for plotting the number of events between events, but who is counting). For example, it can be used to plot the number of days between surgical site infections. It is awesome when you have rare events and a regular chart really makes no sense. You just input the date of the event and it does the rest.

*Tab 19: u chart*

This chart is used when you are plotting data that follow a Poisson distribution. Typically this means that you are counting events (e.g. infections) when the numerator can happen more than once to a patient in the time period. In healthcare epidemiology, we use this chart for device-associated infections, when one patient can get multiple infections per time period. Just input the month (or quarter, year, etc), number of infections in that time period, and the number of ‘denominators’ in that same time period (central line days, ventilator days, etc.)

*Tab 20: p chart*

This is similar to the u chart but should be used when the data fit a binomial distribution. Typically we use these when the numerator is only counted once per patient in a time period. In healthcare epidemiology, we use them for microbiological surveillance as well as for compliance with hand hygiene and other compliance measures. You input the data the same way as for the u chart.

**Items on the list for the next revision**

On the next revision (version 1.3), I will be adding a sample size calculator for non-inferiority studies, as well as a page for non-inferiority analysis using the confidence interval and non-inferiority margin. Hopefully I’ll be able to add the same for equivalency studies too. I will also be adding some tweaks for the rest of the tabs. I also hope to add a *post hoc *power calculator given sample size, level of significance, etc. Finally, I’m going to clarify the comparison of diagnostic tests (or remove it because it is very confusing), and change the g chart to a t-chart.

That’s it! Enjoy!

BTW: Here is the file: Statistical Tests_Current

]]>

I am a post-doctoral research fellow at Clinical and Translational Research Support Center. In this post I would like to shed some light on the idea of research in contemporary sciences. Most of the research done now days is data driven. The million dollar question is how to get the data.

In the field on computer networks most of research is done based on the synthetic data. There are many simulators in the market, which can perform lots of scenarios or one can generate there on scenarios. NS2 (NS3) is an open source simulator and there are other paid network simulator such as OPNET and Netsim to name a few.

Ad-hoc network is one the area of research in computer networks, which intrigues me the most. These types of network arise mostly in military and disaster management domain.

So, one can imagine to collect real data is ad-hoc networks is a major problem. So, researchers rely on simulations to generate these types of data.

Since, moving to medical school and performing analysis on hospital or patient data, the world synthetic data is a joke. It is quite amazing to notice that in one field of research synthetic data is a bench mark and in other it is not even a consideration.

But network to do occur in patient and hospital data and many statistical inference can be draw. Since, my time here at the center I was able to compute some interesting graph of patients movement within the hospital.

The figure below shows patient movement in a hospital through various stages of his or her hospitalization. In the figure various clusters are formed base on the ward in which the patient was during their stay. The dataset of patient movement will be hard to make send of in it raw form. But thanks to the network or more specially graph theatrical approach, that information is so beautifully depicted in the following graph. This graph was created by using NODEXL in excel environment.

In conclusion I would like to say that, we can see in the real sense the meaning of translational research.

M. S. S. Khan, Ph.D

Clinical and Translational Research Fellow

School of Medicine

Division of Infectious Diseases

University of Louisville

Louisville, KY 40202

]]>