Perspectives on Clinical and Translational Research

1

BreakoutDetection: Using the new Twitter tool for influenza surveillance

Hey world!  It’s been a while.  Regardless…

The ingenious duders over at Twitter recently released their BreakoutDetection tool (https://blog.twitter.com/2014/breakout-detection-in-the-wild) to evaluate anomalies and mean shifts in large time-series datasets.  The first thing that came to my mind was: will this work for outbreak detection?  Of course, it should – I regularly use statistical process control charts to monitor disease incidence (particularly of healthcare-associated infections) but I have never been that sold on SPC alone.   Furthermore, if this works appropriately, it can be applied more broadly to cloud-based surveillance systems to more accurately detect outbreaks of anything (Ebola anyone — NHSN data????, etc.).

Although today is my first day back in the office from a six week paternity leave, I had to try this out on the CDC historical flu data to compare to their age old time series plot with their “epidemic threshold” confidence bands (my brain is only partially working from lack of sleep and lack of R use for a while).

Let us install the R package first:

 

Next, get the CDC data from here (I rename it data, because I like to repurpose all of my code with generic datasets, particularly for these test cases):  View Chart Data(http://www.cdc.gov/flu/weekly/weeklyarchives2014-2015/data/NCHSData.csv)

Read that business into R:

 

Run the breakout function and view the plot.  Note I’m using a very small min.size just for the heck of it.  Also, I’m using a 2% increase for anomaly detection.  This is pretty small – I have tried a bunch of other values and get reasonably similar results (a few more detections with the small percent – mostly in the first couple of years – which I think is more useful to determine what is happening with more precision).

 

The plot is nice – and the Twitter folks have a much cooler plot on their blog post.  Unfortunately they didn’t provide the code for the fancier plot and I don’t have the time right now to recreate it.  I also don’t use ggplot a ton (the plot is default ggplot), so I gave up after a couple of seconds trying to get the x tick labels to show up as the week/year.  Sue me (please don’t).

Either way, I think this is pretty useful and appears to be accurate (sorry for the poor quality figures, wordpress seems to destroy my image compression).

Keep up the good work pals.

CDC FluView Plot
Screen Shot 2014-10-27 at 2.51.48 PM
Same data, using Twitter BreakoutDetection Algorithm:
Rplot
Obviously my plot can be made much fancier, but I’m busy enjoying my new daughter Indigo!
Take care pals – and wash your hands.  When you think you have washed them long enough, add 10 more seconds.
*** 10/28/2014 UPDATE
I was able to relabel the x axis.  ggplot is pretty decent. Good job Hadley.
 code (can probably be simplified):

(NOTE: there is an error in there about already having an x axis scale, but it still works.  My guess is there is a way to combine the scale_x_continuous and the theme to get the labels in the right place and rotated)
plot
PEACE!
Tim
Timothy Wiemken PhD MPH CIC
Assistant Professor of Medicine
Assistant Director of Epidemiology and Biostatistics
University of Louisville School of Medicine
Division of Infectious Diseases
Clinical and Translational Research Support Unit
501 E. Broadway, #120B (not for much longer – moving down the hall on Monday Nov 3)
Louisville, KY 40202
@timwiemken
tlwiem01@louisville.edu

 

R

Tim Wiemken • October 27, 2014


Previous Post

Next Post

  • Etienne DELAY

    Many thank’s Tim, a beautiful case study !