por
John R. Fischer, Senior Reporter | September 06, 2022
Suboptimal practices in the development of machine learning models increase the risk of these solutions producing biased data.
Suboptimal practices in the development of machine learning systems put them at risk of producing biased insights when applied in radiology.
But researchers at Mayo Clinic have come up with several strategies for addressing developmental problems and eliminating the risk of biased information, with the first focusing on the data handling process and the 12 suboptimal practices associated with it.
"If these systematic biases are unrecognized or not accurately quantified, suboptimal results will ensue, limiting the application of AI to real-world scenarios,” said Dr. Bradley Erickson, professor of radiology and director of the AI Lab at the Mayo Clinic, in Rochester, Minnesota, in a statement.
The data handling process consists of data collection, data investigation, data splitting and data engineering. The issues afflicting this phase include:
- Data collection – improper identification of the data set, single source of data, unreliable source of data
- Data investigation – inadequate exploratory data analysis, exploratory data analysis with no domain expertise, failing to observe actual data
- Data splitting – leakage between data sets, unrepresentative data sets, overfitting to hyperparameters
- Data engineering – improper feature removal, improper feature rescaling, mismanagement of missing data
The researchers recommend in-depth reviews of clinical and technical literature and working with data science experts to plan out data collections. They also say collections should come from multiple institutions in different countries and regions, use data from different vendors and different times, or include public data sets to incorporate diverse data sets.
"Creating a robust machine learning system requires researchers to do detective work and look for ways in which the data may be fooling you,” said Erickson. "Before you put data into the training module, you must analyze it to ensure it's reflective of your target population. AI won't do it for you."
The second and third reports discuss biases that occur when developing and evaluating the model, and when reporting findings.
The findings were published in
Radiology: Artificial Intelligence, a journal of the Radiological Society of North America.