SIIM20: The challenge of creating robust, clinically useful AI algorithms
July 01, 2020
by Lisa Chamoff
, Contributing Reporter
It’s no secret that artificial intelligence (AI) is here to stay in radiology, but making AI algorithms useful in clinical practice is easier said than done.
At a webinar during the SIIM20 Virtual Meeting, Dr. Daniel Rubin, professor of biomedical data science and director of biomedical informatics at Stanford University, noted that there are challenges when it comes to developing robust AI algorithms.
Rubin explained that most AI models are built with data from only one or two institutions and may not generalize to data they haven't assimilated before. The model may not distinguish differences in patient populations and differences in equipment or parameters for imaging, and rare disorders may be underrepresented.
“The data might not be representative of the real world,” Rubin said.
A recent study that looked at nearly 160,000 chest X-rays from three different institutions that were used to detect pneumonia found that the results varied based on the data sets that were used for training and testing.
“In general, reliability is a problem, depending on how you train the data,” Rubin said.
A way to get around that is to augment data, and train the model on other images.
“Ultimately, that isn’t enough,” Rubin said. “You really need to get as much annotated data as possible. And it’s impossible to get an infinite amount of quality annotated data because it’s very costly to get these annotations done when you’ve done this research and tried to convince radiologists to annotate cases. You know how anxious they are to do these annotations for free.”
It is possible to generate so-called “weak data” with images that aren’t already annotated, and generate labels for them. Rubin cited a recent study wherein an algorithm on 200,000 cases with weak labels performed better than one trained on 20,000 cases with quality labels.
“The more and more data you have, the better the performance” of the algorithm, Rubin said.
It’s better to gather data from multiple sites, but that is challenging, with issues related to storage and legality. One solution is federated learning, to “bring the model to the data instead of the data to the model,” Rubin said. However, centralized data is generally better, heterogeneity in data across sites degrades federated learning; there is a variation of labels across sites and not all institutions have sufficient IT hardware.
There is a low barrier to entry in getting an algorithm up and running, especially in the era of COVID-19, noted Jayashree Kalpathy-Cramer, associate professor of radiology at Harvard Medical School and an assistant in neuroscience at Massachusetts General Hospital. Although it's very easy to create an AI algorithm these days, it’s difficult to create one that is broad, robust, unbiased, fair, self-aware and provides measures of uncertainty.
“Most of the publications have very significant data set biases, in that they’ve used different data sets for the COVID cases compared to normal cases or pneumonias,” Kalpathy-Cramer said. “What we end up seeing is that these algorithms that supposedly are performing at such a high level are really just learning the differences between the data sets.”