Novel algorithm automatically removes poor-quality data to improve AI scalability

Novel algorithm automatically removes poor-quality data to improve AI scalability

Press releases may be edited for formatting or style | September 17, 2020 Artificial Intelligence
SAN FRANCISCO, Sept. 15, 2020 /PRNewswire/ -- AI Healthcare company Presagen has developed a novel technique that is able to automatically clean poor-quality data needed to train scalable and reliable Artificial Intelligence products.

The patent-pending technique, called UDC, was applied to four types of imaging problems: detection of pneumonia in chest x-rays; embryo viability assessment in IVF; detection of cats and dogs; and detection of various types of vehicles.

Illustration - Data Quality

New & Refurbished C-Arm Systems. Call 702.384.0085 Today!

Quest Imaging Solutions provides all major brands of surgical c-arms (new and refurbished) and carries a large inventory for purchase or rent. With over 20 years in the medical equipment business we can help you fulfill your equipment needs

In each case the UDC was able to reliably detect poor-quality data on its own. Removing poor-quality training data resulted in a significant improvement in accuracy (in some cases over 20%) and generalizability of the AI, necessary ingredients for commercial scale and reliability. Lack of scalability is a known problem with AI products, and was highlighted in a recent Venture Beat article by prominent Silicon Valley VC Andreessen Horowitz.

Dr Michelle Perugini, Presagen Co-Founder and CEO said "Real world problems like healthcare are not Kaggle competitions. Data are inherently poor quality due to clinical subjectivity, uncertainty, and even adversarial attacks where data contributors intentionally contribute poor-quality data. It is not always possible to reliably detect errors in data, even by experts. We have seen that even 1% poor-quality data can impact AI training stability and performance. This ground-breaking technique can automatically detect poor-quality data and allows us to build robust commercial AI products."

The UDC can also be used to clean "test data", which are data used to validate AI accuracy. This is often publicly reported to clinics and patients to describe the efficacy of the AI.

Presagen Co-Founder and Chief Strategy Officer, Dr Don Perugini said "It is critical to ensure that test data are clean so that the reported accuracy is a true representation of the AI performance, and not misleading for clinics and patients that need to rely on it. With embryo viability assessment, we have seen literature reporting accuracies above 90%, however we have detected over 10% inherent poor-quality data due to the nature of the problem. This calls into question these very-high reported accuracies."

Removal of poor-quality data using the UDC has a range of other significant benefits to the field of AI.

Dr Jonathan Hall, Presagen's Co-Founder and Chief Scientist said "Removing poor-quality data with UDC reduces the quantity of data needed to train the AI, which is important because high-quality labeled data can be hard to come by. It also reduces the cost and time to train the AI. However, the most exciting benefit of the UDC is greater stability and accuracy within the AI training process itself, which means AI training process can be potentially automated with little or no human oversight, thus protecting people's rights to privacy."

You Must Be Logged In To Post A Comment