CTRSU Journal Club, November 21, 2014
Howdy Partners. In this week’s journal club, I reviewed one of my most despised published papers of all time:
Effect of daily chlorhexidine bathing on hospital-acquired infection by Climo et.al NEJM 2013, 368(6); 533-542.
Boy, this is a terribly done study. Aside from the clear issues with scientific writing (objective doesn’t match methods, which do not match results, and vice versa), There are violently terrible choices in the statistical methodology, which is what I will focus on in this post.
- There is no sample size calculation/power analysis. This is critical for cluster randomized crossover studies so we can interpret the results. They even mention something later in the manuscript about how the data are likely to have occurred because there weren’t many cases of VRE or MRSA. Give me a break – this is an elementary error – reviewers and editor should have put the stop to that.
- There is no washout period. Why????? This is critical for a “bioburden reduction” intervention evaluating infection as the outcome. There was absolutely the risk of contamination effects from the treatment arms into the control arm.
- Sites were dropped due to “non compliance”, but assessment and calculation of non-compliance was not documented. We have no idea how valid the data are without these cases.
- It appears due to the way the methods are written that SAGE provided education for the treatment arm along with the product. If equivalent education was not provided to the control arm, we have no idea if any of the results are due to the treatment or this extra intervention.
- Where are the patient characteristics? There are cluster level statistics but not patient level. This is critical to help interpret the results and try to generalize.
- The biggest issue is here: they did not account for clustering and crossover in their statistical analysis – this makes the results completely invalid and useless. This is not a new design – there are many studies documenting the appropriate approach to analysis of these studies – why did they not account for any clustering and non-independence?? This makes absolutely no sense at all.
- Beyond the inappropriate analysis not accounting for clustering/crossover, I think their models are also inappropriate. There is no discussion of assessing the critically important assumption of the poisson model: mean=variance for the dependent variable (unlikely to have been met) – and the description of variables used for adjustment is weak – how were these selected? A hierarchical model accounting for cluster-level and patient level variables would have been much more appropriate – at least correct for the assumption violation if indeed it was violated.
- Kaplan-Meier curves were presented but not discussed in methods – The outcome isn’t mortality, so how did they account for competing risks? It appears they didn’t.
- Cox Regression is inappropriate as used here – first, they didn’t account for clustering and didn’t describe model building procedures. The KM curves cross as well – this along with the statement identifying patients with a longer length of stay to have a different hazard of the outcomes makes me think the proportional hazards assumption was violated. If so, this model is not correctly used. They could have used some time dependent covariates, but I would have rather seen a parametric accelerated failure model – but again, they need to account for intraclass clustering.
- No limitations section? Give me a break.
- They indicated a couple of times that there were certain percent differences in the study that were statistically significant with P-values of 0.05. This is interesting. First, confidence intervals around the rates would have been MUCH more useful than the P-value so we can assess true differences and not the weak p-value. Second, they do not state their cutoff of a “significant” P-value. Usually it is <0.05, making this actually not significant. Even if they accept 0.05 as significant (P≤0.05 is significant, which is ok), they should have provided the third decimal place. In fact, they do provide a P-value of 0.007 later on (to the third decimal place). If they were only providing two significant digits, this should have been reported as 0.01. The fact that it is three decimal places makes me wonder if they were being less than ethical and rounding down. For example, this P-value could have been 0.049 or 0.051. Either would be rounded to 0.05 but only one would actually be significant.
Based on the issues, the conclusions are way over blown. This study doesn’t prove anything, except they should have had a stronger statistician evaluate or at least review the protocol before attempting to publish.
There are many, many more issues including clinical issues here (timing of catheter, catheter placement, etc), but I don’t have all day to complain.
Overall, this study puts the last nail in the coffin showing that the peer review system is highly flawed, even for “the best medical journal” the NEJM. We need much stronger oversight for scientific publication. The difficulty of publishing negative studies, the influence of industry, and the clear lack of attention to methods really makes me depressed.