“Federated learning,” a new machine learning technique, could improve prediction of COVID-19 outcomes and enhance patient triage and care.
A new machine learning technique, “federated learning”, that predicts risk of critical events and mortality in patients with COVID-19 from their presentation on admission could be used indifferent hospitals and locales to improve triage, care and outcomes, according to the developers of the model at Mount Sinai Health System in New York.
“Machine learning models in health care often require diverse and large-scale data to be robust and translatable outside the patient population they were trained on,” explained the study’s corresponding author, Benjamin Glicksberg, PhD, The Hasso-Plattner Institute for Digital Health at Mount Sinai and Department of Medicine, Ichan School of Medicine at Mount Sinai, New York, NY, in a statement released by the Mount Sinai Health System.
“Federated learning is gaining traction within the biomedical space as a way for models to learn from many sources without exposing any sensitive patient data.In our work, we demonstrate that this strategy can be particularly useful in situations like COVID-19,” Glicksberg said.
Glicksberg and colleagues sought a machine learning model that would overcome a number of challenges, in addition to the inherent limitations of previous models developed within single institutions with algorithms trained on data from relatively small sample sizes, few demographic and clinical variables, and from circumscribed populations that may not be generalizable.
“Additionally, patients with COVID-19 demonstrate varying symptomatology, making safe and successful triaging difficult,” the investigators point out.“Identification of key patient characteristics that govern the course of disease across patient cohorts is important, particularly given its potential to aid physicians and hospitals in predicting disease trajectory, allocating essential resources effectively, and improving outcomes.”
Glicksberg and colleagues selected the Extreme Gradient Boosting (XGBoost) algorithm to implement what they describe as "boosted decision trees on continuous and one-hot encoded categorical features".They applied the analyses within a classification framework, they explained, "because we aimed to implement our models with regard to clinically relevant time boundaries for resource allocation and clinical decision-making, such as resource allocation, triage, and decisions for ICU transfer."
The investigators accessed the electronic health records of 4098 patients admitted with laboratory confirmed COVID-19 in 5 hospitals in New York City in the period from March 15 to May 22, 2020. They first evaluated the XGBoost and baseline comparator models to predict in-hospital mortality and critical events at the time points of 3, 5, 7 and 10 days after admission in one hospital before or on May 1 (n=1514).The model was then externally validated on patient data from the other 4 hospitals before or on May 1 (n=2201), and prospectively validated on all patients after May 1 (n=383).
Glicksberg and colleagues report that, at 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest predictors of critical events.Higher age, anion gap, and C-reactive protein were the strongest correlates of mortality.Higher respirations and D-dimer levels were also associated with higher mortality; while lower diastolic blood pressure was negatively associated.
"We hope that this work showcases benefits and limitations of using federated learning with electronic health records for a disease that has a relative dearth of data in an individual hospital," said study first author, Akhil Vaid, MD, Department of Genetics and Genomic Sciences, Ichan School of Medicine at Mount Sinai, in the released statement.