When I was in high school in the early 2000s, I spent a week of my summer vacation shadowing a pathologist at the local hospital. Every day in his basement office was basically the same; he’d focus his microscope on a slide of tissue, squinting for minutes at a time, methodically making notes about the shape of the cells, their size, their surroundings. When he had enough data points he’d make the phone call: “Squamous cell carcinoma.” “Serrated adenocarcinoma.” “Benign.”
For decades, doctors have relied on the well-trained eyes of human pathologists to give their patients a cancer diagnosis. Now, researchers are teaching machines to do that time-intensive work in as little as a few seconds.
In new research published today in Nature Medicine, scientists at New York University re-trained an off-the-shelf Google deep learning algorithm to distinguish between two of the most common types of lung cancers with 97 percent accuracy. This type of AI—the same tech that identifies faces, animals, and objects in pictures uploaded to Google’s online services—has proven adept at diagnosing disease before, including diabetic blindness and heart conditions. But NYU’s neural network learned how to do something no pathologist has ever done: identify the genetic mutations teeming inside each tumor from just a picture.
“I thought the real novelty would be not just to show the AI is as good as humans, but to have it provide insights a human expert would not be able to,” says Aristotelis Tsirigos, a pathologist at the NYU School of Medicine and a lead author on the new study.
To do so, Tsirigos’ team started with Google’s Inception v3—an open-source algorithm that Google trained to identify 1000 different classes of objects. To teach the algorithm to distinguish between images of cancerous and healthy tissue, the researchers showed it hundreds of thousands of images taken from The Cancer Genome Atlas, a public library of patient tissue samples.
Once Inception figured out how to pick out cancerous cells with 99 percent accuracy, the next step was teaching it to tell two kinds of lung cancers apart—adenocarcinoma from squamous cell carcinoma. Together, they represent the most prevalent forms of the disease, which kills more than 150,000 people a year. While they appear frustratingly similar under the microscope, the two cancer types are treated very differently. Getting it right can mean the difference between life and death for patients.
When the researchers tested Inception on independent samples taken from cancer patients at NYU, its accuracy went down a bit, but not much. It still correctly diagnosed the images between 83 and 97 percent of the time. That’s not surprising, says Tsirigos, given that the hospital’s samples carried much more noise—inflammation, dead tissue, and white blood cells—and were often processed differently than the frozen TCGA samples. Improving the accuracy will just be a matter of having pathologists annotate slides with more of those additional features, so the algorithm can learn to pick those out too.
But it wasn’t a helping human hand that taught Inception to ‘see’ genetic mutations in those histology slides. That trick the algorithm learned all on its own.
Again working with data from the TCGA, Tsirigos’ team fed Inception genetic profiles for each tumor, along with the slide images. When they tested their system on new images, it was able to not only identify which ones showed cancerous tissue, but the genetic mutations of that particular tissue sample. The neural network had learned to notice extremely subtle changes to a tumor sample’s appearance, which pathologists cannot see. “These cancer-driving mutations appear to have microscopic effects that the algorithm can detect,” says Tsirigos. What those subtle changes are, however, “we don’t know. They’re buried [in the algorithm] and nobody really knows how to extract them.”
This is the black box problem of deep learning, but it’s especially pressing in medicine. Critics argue that these algorithms must first be made more transparent to their creators before going into widespread use. Otherwise, how will anyone be able to catch their inevitable failures, which may be the difference between a patient living and dying? But people like Olivier Elemento, director of the Caryl and Israel Englander Institute for Precision Medicine at Cornell, say it’d be stupid not to use a clinical test that gets the answers right 99 percent of the time, even without knowing how it works.
“Honestly, for an algorithm of this kind to be in a clinical test, it doesn’t need to have fully interpretable features, it just has to be reliable,” says Elemento. But getting near-perfect reliability isn’t so easy. Different hospitals handle their tumor samples using different instruments and protocols. Teaching one algorithm to navigate all that variability will be a steep task indeed.
But that’s what Tsirigos and his team plan to do. In the coming months, the researchers will keep training their AI program with more data from more varied sources. Then they’ll start thinking about spinning up a company to seek FDA approval. Because of cost and time, sequencing of tumor samples isn’t always the standard of care in the US. Imagine being able to send in a digital photo of a tumor sample and get a diagnosis complete with viable treatment options almost instantaneously. That’s where this is all headed.
“The big question is, will this be trustworthy enough to replace current practice?” says Daniel Rubin, Director of Biomedical Informatics at the Stanford Cancer Institute. Not without a lot of future validation work, he says. But it does point toward a future where pathologists work in partnership with computers. “What this paper really shows is that there’s a lot more information in the images than what a human being can pull out.”
That’s a theme beyond just digital pathology. With Google and other companies making state-of-the-art algorithms available as open-source code, researchers can now start an AI project of their own with relative ease. With just a bit of customization, those neural nets are ready to be set loose on a mountain of biomedical image data, not just tumor images.
I ask Tsirigos if he’s had any trouble finding fellow pathologists to volunteer to train up his cancer-classifier. He laughs. In the beginning, he says he was afraid to ask anyone at NYU to join the project. After all, they’d be helping to create a future competitor. But in the end, recruitment proved easy. People were curious to see what Inception could do. Not just for lung cancer, but for their own projects too. They’re not worried about being replaced, says Tsirigos, they’re excited about being able to ask deeper questions because the machine is taking care of the simple ones. Leave the object recognition to the machines, and there’s still plenty medicine left for the humans.