The next great digital storage medium may be us—or our DNA, to be precise. Deoxyribonucleic acid stores the code that makes us humans and not, say, flatworms. Which is to say that DNA is remarkably evolved storage media that can pack in all the variety and complexity of organic life in just a small amount of biological matter?
But, turning DNA into storage for digital and not biological information, using artificial means, is tough because it’s proven difficult to encode efficiently and reliably, say researchers at the EMBL-European Bioinformatics Institute (EMBL-EBI).
In the latest issue of Nature EMBL-EBI, researchers Nick Goldman and Ewan Birney explain that their breakthrough could make it possible to “store at least 100 million hours of high-definition video in about a cup of DNA.”
Goldman and Birney said they enlisted the help of bio-analytics instrument maker Agilent Technologies, a former lab of Hewlett-Packard, to help synthesize DNA from encoded digital information—in this case, an MP3 of Martin Luther King's “I Have a Dream” speech, a .txt file of Shakespeare’s sonnets, a .pdf file containing James Watson and Francis Crick’s original paper describing the structure of DNA, and a final file describing the encoding itself.
“We knew we needed to make a code using only short strings of DNA, and to do it in such a way that creating a run of the same letter would be impossible,” Goldman explained. “So we figured, let’s break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn’t allow repeats. That way, you would have to have the same error on four different fragments for it to fail—and that would be very rare.”
The result was “hundreds of thousands of pieces of DNA” that looked “like a tiny piece of dust”. Agilent sent the synthesized sample back to the researchers at EMBL-EBI, where they sequenced it and said they decoded the files without errors.
But, turning DNA into storage for digital and not biological information, using artificial means, is tough because it’s proven difficult to encode efficiently and reliably, say researchers at the EMBL-European Bioinformatics Institute (EMBL-EBI).
In the latest issue of Nature EMBL-EBI, researchers Nick Goldman and Ewan Birney explain that their breakthrough could make it possible to “store at least 100 million hours of high-definition video in about a cup of DNA.”
Goldman and Birney said they enlisted the help of bio-analytics instrument maker Agilent Technologies, a former lab of Hewlett-Packard, to help synthesize DNA from encoded digital information—in this case, an MP3 of Martin Luther King's “I Have a Dream” speech, a .txt file of Shakespeare’s sonnets, a .pdf file containing James Watson and Francis Crick’s original paper describing the structure of DNA, and a final file describing the encoding itself.
“We knew we needed to make a code using only short strings of DNA, and to do it in such a way that creating a run of the same letter would be impossible,” Goldman explained. “So we figured, let’s break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn’t allow repeats. That way, you would have to have the same error on four different fragments for it to fail—and that would be very rare.”
The result was “hundreds of thousands of pieces of DNA” that looked “like a tiny piece of dust”. Agilent sent the synthesized sample back to the researchers at EMBL-EBI, where they sequenced it and said they decoded the files without errors.