If not for an anthropologist and sociologist, the leaders of a prominent health innovation hub at Duke University would never have known that the clinical AI tool they had been using on hospital patients for two years was making life far more difficult for its nurses.
The tool, which uses deep learning to determine the chances a hospital patient will develop sepsis, has had an overwhelmingly positive impact on patients. But the tool required that nurses present its results — in the form of a color-coded risk scorecard — to clinicians, including physicians they’d never worked with before. It disrupted the hospital’s traditional power hierarchy and workflow, rendering nurses uncomfortable and doctors defensive.
As a growing number of leading health systems rush to deploy AI-powered tools to help predict outcomes — often under the premise that they will boost clinicians’ efficiency, decrease hospital costs, and improve patient care — far less attention has been paid to how the tools impact the people charged with using them: frontline health care workers.
That’s where the sociologist and anthropologist come in. The researchers are part of a larger team at Duke that is pioneering a uniquely inclusive approach to developing and deploying clinical AI tools. Rather than deploying externally developed AI systems — many of which haven’t been tested in the clinic — Duke creates its own tools, starting by drawing from ideas among staff. After a rigorous review process that loops in engineers, health care workers, and university leadership, social scientists assess the tools’ real-world impacts on patients and workers.
The team is developing other strategies as well, not only to make sure the tools are easy for providers to weave into their workflow, but also to verify that clinicians actually understand how they should be used. As part of this work, Duke is brainstorming new ways of labeling AI systems, such as a “nutrition facts” label that makes it clear what a particular tool is designed to do and how it should be used. They’re also regularly publishing peer-reviewed studies and soliciting feedback from hospital staff and outside experts.
“You want people thinking critically about the implications of technology on society,” said Mark Sendak, population health and data science lead at the Duke Institute for Health Innovation.
Otherwise, “we can really mess this up,” he added.
Getting practitioners to adopt AI systems that are either opaquely defined or poorly introduced is arduous work. Clinicians, nurses, and other providers may be hesitant to embrace new tools — especially those that threaten to interfere with their preferred routines — or they may have had a negative prior experience with an AI system that was too time-consuming or cumbersome.
The Duke team doesn’t want to create another notification that causes a headache for providers — or one that’s easy for them to ignore. Instead, they’re focused on tools that add clear value. The easiest starting point: ask health workers what would be helpful.
“You don’t start by writing code,” said Sendak, the data science lead. “Eventually you get there, but that happens in parallel with clinicians around the workflow design,” he added.
That involves some trial and error, like in the case of the sepsis tool. It was only when the social science researchers reviewed the rollout of that tool that they saw the process was anything but seamless.
While the sepsis algorithm succeeded in slotting patients into the appropriate risk category and directing the most care to the highest-risk individuals, it also quickly created friction between nurses and clinicians. Nurses who had never before directly interacted with attending physicians — and who worked in a different unit on a different floor — were suddenly charged with calling them and communicating patients’ sepsis results.
“Having neither a previous nor current face-to-face relationship with the doctors [the nurses] were calling was unusual and almost prohibitive to effectively working together,” Madeleine Clare Elish, an anthropologist who previously served as the program director at the Data and Society Research Institute, and Elizabeth Anne Watkins, a sociologist and affiliate at the Data and Society Research Institute, wrote in their report.
The Duke nurses came up with a range of strategies to deal with this issue, such as timing their calls carefully with physicians’ schedules to make sure they were in a headspace where they would be more receptive to their call. At times, they bundled their calls, discussing several patients at once so they wouldn’t be seen as a repeated disruption. But that effort — something Elish and Watkins called “repair work” — is difficult and emotional, and takes another toll on nurses’ well-being.
Had it not been for the sociological research, the extra labor being taken on by the Duke nurses might have gone unnoticed, which could have created more problems down the road — and perhaps would have shortened the lifespan of the AI model.
Ideally, the Duke team will take the researchers’ findings into account as they continue to hone the sepsis model, making sure the tool is producing fair work for all of the hospital staff.
“Duke is putting a lot of effort into addressing equity by design,” said Suresh Balu, associate dean for innovation and partnership at the Duke School of Medicine and program director of the Duke Institute for Health Innovation. “There is lots to be done, but the awareness is improving.”
Every year since 2014, the Duke team has put out a formal request for applications asking frontline health care workers — everyone from clinicians and nurses to students and trainees — to pinpoint the most pressing issues they encounter on the hospital floor and propose potential tech-driven solutions to those problems. Neither artificial intelligence nor machine learning are requirements, but so far, a majority of the proposals have included one or both.
“They come to us,” Balu said.
Previous projects have produced AI tools designed to save clinicians time and effort, such as an easy-to-use algorithm that spots urgent heart problems in patients. Others improve the patient experience, such as a deep learning tool that scans photographs of dermatology patients’ skin and lets clinicians more rapidly slot them into the appropriate treatment pathways for faster care.
Once the models are crafted by a group of in-house engineers and medical staff, reviewed by the innovation team and associate dean, and released, the social scientists study their real-world impacts. Among their questions: How do you ensure that frontline clinicians actually know when — and when not — to use an AI system to help inform a decision about a patient’s treatment? Clinicians, engineers, and frontline staff have a constant feedback loop in weekly faculty meetings.
“At the frontline level we talked about it in our weekly faculty meeting — the point person on that project would say, ‘Do you have any feedback on it?’ And then the next month they’d say, ‘OK, this is what we heard last month so we did this. Does anyone have any feedback on that?’” said Dan Buckland, assistant professor of surgery and mechanical engineering at Duke University Hospital. He said personally, he’s “had a lot of questions” about various AI tools being developed and implemented.
“And so far no one has been too busy to answer them,” added Buckland, who has also been involved in developing some of the AI systems that Duke is currently using.
Duke’s approach is an effort at transparency at a time when the vast majority of AI tools remain understudied and often poorly understood among the broader public. Unlike drug candidates, which are required to pass through a series of rigorous steps as part of the clinical trial process, there’s no equivalent evaluation system for AI tools, which experts say poses a significant problem.
Already, some AI tools have been shown to worsen or contribute to existing health disparities — particularly along the lines of race, gender, and socioeconomic status. There is also no standard way for AI system developers to communicate AI tools’ intended uses, limitations, or general safety.
“It’s a free-for-all,” Balu said.
Constantly evolving AI systems are, in many ways, harder to evaluate than drugs, which generally do not change once they are approved. In a paper published in April in the journal Nature Digital Medicine, Harvard Law School professor Glenn Cohen proposed one potential fix for that: Rather than evaluating AI tools as static products, they should be assessed as systems capable of being reevaluated in step with their evolution.
“This shift in perspective — from a product view to a system view — is central to maximizing the safety and efficacy of AI/ML in health care,” Cohen wrote.
A big part of evaluating that system is closely examining how it works in the clinic. At Duke, the researchers aren’t just looking at how accurate their models are — but also how effective they are in the real world setting of a hectic hospital.
At the same time it is crowdsourcing ideas for tools, the team is also getting creative with how to make sure clinicians understand how to use the tools they develop. That step could prove crucial to taking an AI model from an accurate predictive tool to an actually useful technology.
A prime example of those efforts: the nutrition facts label Duke researchers have tested with their sepsis model.
In a paper published in March in the journal Nature Digital Medicine, the team presents a prototype of a label that included a summary of the tool’s intended uses and directions, warnings, and data on the tool’s validation and performance.
“We wanted to clearly define what the tool is, where you can use it, and more importantly, where you should not use it,” Balu said.
This is part of a yearlong series of articles exploring the use of artificial intelligence in health care that is partly funded by a grant from the Commonwealth Fund.