Investigators use Salmonella strains to test accuracy and resistance.
With food recalls making headlines—and people sick—across the country, clinicians desperately need new approaches for identifying pathogens and diagnosing related illnesses.
A recent study suggests a solution to this challenge may be on the horizon.
In a collaboration between computer scientists, engineers, and infectious disease specialists (among others), models using whole genome sequence data to predict minimum inhibitory concentrations (MICs) for nontyphoidal Salmonella in strains collected and sequenced via the National Antimicrobial Resistance Monitoring System between 2002 and 2016 have emerged. The team published their findings this fall in the Journal of Clinical Microbiology (JCM). And, importantly, they believe their whole genome sequence-based models “can be readily applied to other important human pathogens” and thereby “guide responses to outbreaks and inform antibiotic stewardship decisions.”
Investigators could not be reached for comment on deadline; however, in their concluding remarks, they noted, “In this study, we have built machine learning-based MIC prediction models for nontyphoidal Salmonella genomes using XGBoost that achieve overall accuracies of 95% to 96%... To our knowledge, this is one of the largest and most accurate MIC prediction models to be published to date. Importantly, it provides a model strategy for performing MIC prediction directly from genome sequence data that could be applied to other human or veterinary pathogens.”
The investigators, who have also developed a similar approach for Klebsiella pneumoniae and published their findings on that earlier this year, constructed their model by first dividing each genome to be analyzed into sets of nonredundant overlapping nucleotide 10-mers using the k-mer counting program KMC. From that, they developed a matrix in which each row contains the k-mers for a genome as well as the MIC for a single antibiotic. They used an XGBoost regressor to build the MIC prediction model. Investigators describe XGBoost as “a computationally scalable method for generating gradient boosted models. Gradient boosting is an ensemble method by which decision trees are generated to minimize an error function.”
To evaluate the accuracy of their model, the team used a collection of 5278 nontyphoidal Salmonella genomes to generate XGBoost-based machine learning models for MICs for 15 antibiotics: ampicillin, amoxicillin/clavulanic acid, ceftriaxone, azithromycin, chloramphenicol, ciprofloxacin, trimethoprim/sulfamethoxazolem, sulfisoxazole, cefoxitin, gentamicin, kanamycin, nalidixic acid, streptomycin, tetracycline, and ceftiofur. The MIC prediction models—tested by performing 10-fold cross validations—were found to have an overall average accuracy of 95% within ±1 2-fold dilution step, an average very major error (VME) rate of 2.7%, and an average major error (ME) rate of 0.1%.
“The model predicts MICs with no a priori information about the underlying gene content or resistance phenotypes of the strains,” they wrote. “By selecting diverse genomes for training sets, we show that highly accurate MIC prediction models can be generated with fewer than 500 genomes. We also show that our approach for predicting MICs is stable over time despite annual fluctuations in antimicrobial resistance gene content in the sampled genomes. Finally, using feature selection, we explore the important genomic regions identified by the models for predicting MICs.”
In a related commentary also published in JCM, Jonathan M. Monk, PhD, department of bioengineering, University of California San Diego, wrote, “Requirements for antimicrobial-resistance diagnostic devices are strict. [US Food and Drug Administration] standards for automated systems recommend a ME rate ≤ 3%. All antibiotic MICs predicted [here] have ME rates in this range… [and] models for 7 of the 15 antibiotics (amoxicillin/clavulanic acid, ceftriaxone, chloramphenicol, cefoxitin, streptomycin, tetracycline and ceftiofur) had acceptable VME rates… Rapid identification and targeted treatment of pathogenic bacteria using tools assisted by algorithms presented here… would enable precision medicine for pathogens that would lower the incidence of antibiotic resistance, improve patient health, and lead to decrease hospital costs.”