Radiologists look at different factors than artificial intelligence applications do when diagnosing breast cancer from medical images.
In a study, NYU researchers noted the discrepancies in how human radiologists make diagnoses in screenings compared to deep neural networks (DNNs), which are layers of computing elements (neurons) simulated on a computer. These solutions learn by building many layers and configuring how calculations are performed based on data input.
In an examination of malignant soft breast tissue lesions, the scientists found that radiologists primarily focused on brightness and shape to make diagnoses, whereas DNNs used tiny details scattered across images to make a determination. These details were also outside of regions that radiologists labeled as most important.
The researchers expect their findings will help providers better understand the potential and limitations that AI possesses in cancer diagnostics. A lack of knowledge about their diagnostic processes is what makes moving these applications into clinical workflows challenging. “The main advantages are that DNNs can diagnose quickly, do not suffer from fatigue and can be deployed anywhere in the world. The latter point is particularly useful in economically disadvantaged regions that lack access to radiologists. We want to improve the diagnostic ability of AI systems such that these advantages can be realized in the future,” lead author Taro Makino, a doctoral candidate in NYU’s Center for Data Science, told HCB News.
Using the NYU breast cancer screening data Set, the researchers selected 720 mammograms for ten radiologists with varying levels of experience to evaluate. The same tests were assessed by five DNNs trained from random weight initializations.
They found that while DNNs take into account some of the same factors that radiologists do when making diagnoses, they do so to a limited extent. The authors did not evaluate whether one was better than the other.
Makino says to the best of his knowledge, this study is the first to compare the decision-making of AI and humans in the medical diagnostic setting. “I believe medical AI systems will become more capable as the scale of data sets and computers continue to increase. They are also likely to improve in tandem with broader progress in AI technology. However, we still need to better understand their decision-making in order to prevent catastrophic failures.”
Whether AI is more accurate in identifying cancer than human radiologists is a controversial debate in the field of medical imaging. A review of 12 studies in 2021 commissioned by the the U.K. National Screening Committee
found insufficient evidence to suggest that this was true. Assessing data from over 131,822 screened women in Sweden, the U.S., Germany, the Netherlands and Spain, British researchers found that the methods used in these studies were poor and their applicability to European or U.K. breast cancer screening programs was low.
In three large studies, the majority (34 out of 36 or 94%) of AI systems were less accurate than a single radiologist, and all were less accurate than the consensus of two or more radiologists, which is the standard practice in Europe. While five smaller studies showed the opposite, the authors in those trials noted high risk of bias and said their results were not replicated in larger studies.
The NYU study was supported by grants from the National Science Foundation and the National Institutes of Health.
The findings were published in
Nature Scientific Reports.