Does machine learning improve outcomes in sepsis? This possibility was tested by Adams and colleagues in a recent Nature Medicine paper with the title: “Prospective, multi-site study of patient outcomes after implementation of the TREWS machine learning-based early warning system for sepsis.”
Background: Sepsis remains a major killer of patients in the hospital, so any study claiming a survival benefit will garner lots of attention. To date, despite billions of dollars in drug development, sepsis has lacked a satisfactory treatment. Still, the conventional wisdom is that patients will do better if we identify sepsis quickly and start fluids and antibiotics as soon as possible.
The investigators of this study wanted to know whether a machine learning tool that sifts through patient data and uses proprietary software to identify sepsis increases survival. Importantly, this was not a randomized controlled trial. Instead the comparison was between patients whose treating physician confirmed an AI generated alert in the medical record within 3 hours versus those who did so later. They compared outcomes, including mortality, in these two groups. They found impressive differences in mortality between patients whose alerts were confirmed within 3 hours versus later.
The study population involved 5 hospitals that were were monitored by the machine learning tool, called TREWS. Ultimately 6,877 patient encounters that were flagged by TREWS and met inclusion criteria.
Adams et al argue that their methodology (comparing the timing of alert confirmation by the treating physicians) made use “of natural variability in provider practice to create a convenience control group.” This language would imply that there was little difference in the patients with alert confirmations happening before or after 3 hours. This was not so. Some of the patient characteristics from the supplementary data from the Adams et al study are shown below:
There are substantial and important differences! Those with < 3 hour alert confirmations had significantly lower blood pressure, higher respiratory rates, faster heart rates, and higher white blood cell counts. Notably, they also had higher temperatures. It is safe to say that these patients appeared sicker; they would have been treated differently by nurses and doctors and those differences might have made doctors even more attuned to confirm the electronic TREWS alert. On the other hand, the later group may have included patients with more vague presenting symptoms. More on that below.
What did Adams and colleagues show? Patients in the < 3 hour group had lower unadjusted mortality rates (14.6% versus 19.2%, P< 0.001), and improved SOFA score (a measure of sepsis severity) progression (−0.8 versus −0.4, P< 0.001). After making statistical adjustments for possible confounding variables – including vital signs and patient demographics, the < 3 hour group had lower mortality (adjusted risk difference (ARD) −3.34%, CI −5.10, −1.67%, and adjusted relative reduction (ARR) −18.18%, CI −26.31, −9.65%; P< 0.001), improved SOFA progression (ARD −0.26; CI −0.42, −0.11; P= 0.001). Given the differences at baseline between the two groups, I was relieved that the investigators made an effort to control for differences in patient characteristics; hence the adjustment. Indeed, the adjusted results look pretty good.
Still, the nature of the study suffers from the problems of any observational trial. Observational trials all have the limitation that association is not causation. Virtually all observational trials make an effort to account for various covariates that might be sources of confounding. Yet, again and again, promising observational trial results have been refuted by subsequent randomized controlled trials involving interventions for sepsis. In the same way, this study did not eliminate all confounding variables that might skew the results. As I will argue below, vague presenting symptoms may have been an unacknowledged confounder in this study.
Adams et al argue that earlier antibiotics are a benefit of early sepsis detection. However, because of the way the authors did their analysis, we do not know for sure that antibiotic timing is the reason why patients with earlier alert confirmation did better. One way to find out would be to include antibiotic timing as a covariate in the model, and see whether the survival benefit of < 3 hour alert confirmation disappears. They did not provide such an analysis. Instead, they did a separate analysis that showed that overall, patients in the cohort receiving antibiotics in less than 3 hours had improved survival. I truly wonder about this. Was there no difference in antibiotic timing in the groups (< 3 hour and later alert confirmation) that were the subject of their primary analysis?
The way I see it, the later confirmation group had more vague presenting symptoms of sepsis and they probably showed less clear cut signs of infection (see the patient characteristic table above). If I were taking care of these patients, I might hesitate to confirm the sepsis flag, especially if I had doubts about whether the patients might have some alternate, non-sepsis, diagnosis.
Previous work shows that vague symptoms themselves are dangerous for patients ultimately diagnosed with sepsis:
In 2018, Filbin et al. published a Critical Care paper titled Presenting Symptoms Independently Predict Mortality in Septic Shock: Importance of a Previously Unmeasured Confounder. Vague presenting symptoms included weakness, fatigue, and no fever. These patients were compared to those with fever or other explicit signs of infection (cough, urinary pain, or skin redness). In this study, patients with vague symptoms had a higher mortality rate.
But delays in antibiotics were not apparently the problem in the Filbin study “Consistent with our hypothesis, we found differences in mortality, 34% versus 16%, for septic patients who presented to the ED with vague versus explicit symptoms of infection…Our data did not support the corollary to our hypothesis that antibiotic delay was a primary driver of mortality.
Of these results, NEJM Journal Watch editorialized: “I would have lost money betting that delayed antibiotics was the key factor contributing to increased mortality in these patients with vague presenting symptoms. The authors suggest instead that vague symptoms may represent a completely different phenotype of sepsis.”
Looking deeper at Filbin et al’s results, it is likely that the absence of fever might explain why vague symptoms were linked with death. Fever was a major determinant of “vague presentation”.
Focusing on fever makes the Filbin paper appear less groundbreaking. Vague presentation often simply meant afebrile, or worse, hypothermic, in the Filbin study. Previous work has shown that lack of fever or low body temperature heralds a poor prognosis in sepsis. For example, this study of 2225 patients with sepsis in Sweden showed an inverse relationship between fever and mortality: Fever in the Emergency Department Predicts Survival of Patients With Severe Sepsis and Septic Shock Admitted to the ICU. As readers know, the evolution of fever has been a recurring theme of my writing here and in the published literature.
Returning to Adams et al.: Their results hint that higher body temperatures are protective in sepsis, in accord with many previous studies. The current study does not, in the end, show that a computerized AI tool improves outcomes. After all, every patient in this study received the TREWS machine learning alert, and there was no traditional control group.
One important lesson of this study is that we should pay extra attention to those sepsis who appear less sick, and less febrile. Paradoxically, these patients often do worse for reasons that have nothing to do with antibiotic timing (as in Filbin). Or they may do worse by some combination of antibiotics, treatment quality, and because they are less able to mount a robust physiological response to the infection.
As a final thought, I can only hope that someone attempts to reproduce the current study before it generates all manner of iron-clad “performance” guidelines. I won’t hold my breath. Lack of concrete evidence has not stopped guideline authors before. I suspect that because this study “confirms” what we think we already know, it will become standard of care without replication or a better designed confirmatory study.
Copyright © Joe Alcock MD
Emergency Physician, Educator, Researcher, interested in the microbiome, evolution, and medicine