Darpa Wants to Build an Image Search Engine out of DNA

Most people use Google's search-by-image feature to either look for copyright infringement, or for shopping. See some shoes you like on a frenemy's Instagram? Search will pull up all the matching images on the web, including from sites that will sell you the same pair. In order to do that, Google’s computer vision algorithms had to be trained to extract identifying features like colors, textures, and shapes from a vast catalogue of images. Luis Ceze, a computer scientist at the University of Washington, wants to encode that same process directly in DNA, making the molecules themselves carry out that computer vision work. And he wants to do it using your photos.

On Wednesday, Ceze’s team at UW launched a social media campaign to collect 10,000 images from around the world and preserve their pixels in the As, Ts, Cs and Gs that make up the building blocks of life. They’ve done this sort of thing before; in 2016 they encoded an entire OK Go music video—setting the record for most amount of data stored in DNA. But this time they decided to crowdsource the data, building a website where people can submit photos and encouraging people to share their images on social media with the hashtag #MemoriesInDNA. “DNA can last thousands of years,” says Ceze. “So this is essentially a time capsule. What do you want to preserve forever?”

UW’s #MemoriesInDNA campaign might be a bit of a gimmick (there are plenty of available, high-quality image databases on which to train a molecular search engine). But the science behind it is a very real attempt to upend the last six decades of computing. DNA-based storage has so far been good only for that: encoding pixels and locking them up in freeze-dried strands invisible to the human eye. So far, no one’s figured out how to retrieve and process DNA-stored data—a necessary first step for creating any kind of serious molecular computing platform.

Who would want that, exactly? Well, Darpa for one.

In the last few months, the Department of Defense agency tasked with funding science’s most far-out hopes has begun investing millions in discovering radical, non-binary ways to work with data. “Molecules offer a very different approach to ‘computing’ than the 0s and 1s of our existing digital systems,” says Anne Fischer, program manager for Darpa’s Molecular Informatics program, which has so far awarded $15.3 million to projects at Harvard, Brown, the University of Illinois, and the University of Washington. “The global community is creating data at a tremendous rate, and developing new approaches to access and process this information is critical to address looming shortfalls in storage capacity and computational speed.”

The digital age began with a simple act of delegation: man outsourcing memory to machine. First in vacuum tubes, then with transistors, tape discs, and flash drives. After more than 60 years, the essential logic-based architecture described by John von Neumann still undergirds modern computing infrastructures. And by any measure it has served humanity well. But its limits are becoming apparent as humans create ever more complex data.

“Moore’s Law has been all about miniaturization of devices,” says Karin Strauss, a senior scientist at Microsoft and collaborator on the UW project. “Electronics are great and will continue to exist, of course, but molecules are the final frontier when it comes to miniaturization.” Chemistry offers an untapped palette of molecular diversity—properties such as structure, size, charge, and polarity—that could be harnessed for information processing.

In the case of DNA, it’s the structure that does the heavy lifting. Strauss will be working with Ceze to first extract all the visual features from the crowdsourced images, and then map them into strings of As, Ts, Cs, and Gs. Each photo might get tens of thousands of unique DNA segments, each one encoding for a curve, or a vertical line, or a patch of blue. Then they can introduce a coded “query,” just the way you would type a few keywords into Google search. Except this query would be a string of DNA that corresponds to some of those visual features. And each query sequence would get a special coating of magnetic nanoparticles.

Drop a few of those in a microtest tube of DNA, where 10,000 images are stored in a few milliliters, and they’ll grab all the sequences that are a match. Then you just need a magnet to haul them out and a sequencer and some more algorithms to turn them back into visual images.

More on DNA Storage

Megan Molteni
Scientists Upload a Galloping Horse GIF Into Bacteria With Crispr
Andy Greenberg
Biohackers Encoded Malware in a Strand of DNA
Sophia Chen
What if Quantum Computers Used Hard Drives Made of DNA?

That’s how they hope it will work, anyway. “The core of the Darpa project is figuring out which mechanisms are best equipped to do molecular processing,” says Ceze. “We’re focusing on visual data because it’s by far the largest type of data in the world. And we think DNA’s specific binding properties make it well-suited for that. But we’ll see.”

Other researchers are leveraging different physical properties of DNA to encode information. Olgica Milenkovic’s group at the University of Illinois isn’t manufacturing huge amounts of synthetic DNA, but rather making small cuts in naturally-occurring bacterial DNA. These changes can be counted, which makes them essentially addition and subtraction operators—one of the building blocks for programming languages like Java.

And DNA isn’t the only molecule Darpa is interested in. Brenda Rubenstein is a theoretical chemist at Brown, where she’s worked on quantum computing—encoding bits of information as either atoms, ions, photons, or electrons. But now she’s extending that idea to organic compounds, specifically ones that have multiple places to attach R-groups—the variable parts of molecules, that lend them different physical and chemical properties. Running different reactions modifies those R-groups in predictable ways, which makes them good for computing basic linear algebra equations, says Rubenstein. “They have so many properties, there’s an incredible capacity for storing and processing information,” she says. “I think small molecules are almost an obvious choice for broadening the scope of computing.”

Molecules, like DNA, might prove to have some serious advantages over the in silico state of the art; they’ve got way denser storage potential, last way longer, and may even be able to process way more in parallel. But they’re not a silver bullet. DNA, like computer code, can still be hacked. And it’s hard to see how you’d get a soup of small molecule reactions crammed under the hood of your smartphone. But it’s fun to at least imagine that years from now the Department of Defense might be building underground bunkers, not for server farms, but for trays of microscopic glass beads; a nation’s secrets held in freeze-dried DNA.

Darpa Wants to Build an Image Search Engine out of DNA

More on DNA Storage

Related Video

Crispr Gene Editing Explained

About the author

Leave a Reply Cancel reply