A model including clinical symptom reports, Internet searches, and emergency room data predict influenza hospitalizations better in affluent vs. poorer communities.
A newly-developed model for influenza outbreak tracking could be useful, but so far becomes less accurate as the poverty level of the population increases, according to a paper published in PLOS One.
Investigators from Northeastern University and the University of Texas at Austin built and tested a multi-source influenza surveillance system in order to improve the existing model as well as protect those most at-risk for flu and flu complications through the integration of traditional methods, electronic health records, and internet data. The study authors explained that digital disease surveillance has become popular and noted that this type of tracking tool can often correlate to “some degree” of epidemiological time-series during seasonal outbreaks.
And by combining this with traditional methods—which can fall prey to biases due to socioeconomic gaps between patient groups with more access to health care–the study authors expected to capture more early warning signs.
For their model, the investigators used 3 data sources, including county-level percentages of emergency department visits for upper respiratory infection in the Dallas/ Fort Worth area of Texas between 2007 and 2012. They also used the Centers for Disease Control and Prevention’s tool, ILINet, which tracks flu data gathered from health care providers across the U.S.
And finally, they used the now-deactivated Google Flu Trends, which estimates the number of flu patients per 100,000 people based on search terms associated with signs and symptoms of the flu and respiratory infections, the study authors said.The study authors estimated the influenza hospitalization rate per 100,000 in each ZIP code and found that hospitalization rates have a higher correlation with poverty level and age greater than 65 years, they said. Even after controlling for age, poverty and influenza burden are correlated.
As the poverty proportion in each ZIP code increased, the study authors found that their data became less informative. For example, the model made the best prediction in the most affluent communities, and the worst predictions in the ZIP codes where the poverty levels were between 21% and 48%. This was true no matter the data sources used as predictors, the study authors said.
They noted one possible explanation could be that the poorest ZIP codes were either out of sync with each other or were more widely distributed throughout the study zone. However, upon further analysis, the investigators found the opposite to be true: that the poorest ZIP codes were located no less closely together than the more affluent ZIP codes.
So, they wondered if geographic clusters could explain the discrepancies in the forecast accuracy. They were able to confirm that forecast accuracy decreases as the poverty level increases, they said.
“Understanding how and why disease risk and health burden vary by socioeconomic status, race, ethnicity, immigration status, and other factors is essential for supporting a healthy and equitable society and economy,” lead study author Samuel V. Scarpino, said in a press release. “Otherwise, new machine-learning and big-data systems are likely to perpetuate the existing biases of traditional decision-making systems.”
The study authors said they were surprised to find that more affluent communities provided more reliable data compared to poorer communities, but speculated that the Google tool and ILINet provide low coverage for at-risk populations. These gaps in coverage need to be reviewed and remedied, the study authors said, and the framework of their model could be applied to evaluate and integrate even more data sources in the future to address this issue.