ARCH_DNA_transparent.png
 

What is the Arch Library in DNA Project?

 
 

The Arch Mission Foundation has announced a special collection of data stored in synthetic DNA. This special collection is called the “Arch Library in DNA Project” and is part of the Arch Mission Lunar Library initiative. The Lunar Library will contain a backup of human civilization, using new forms of big data storage technology that are durable for up to billions of years on the Moon. Molecular storage in DNA is one of the new technologies that will be included in the Lunar Library. The Lunar Library will also include data stored with other technologies, such as analog and digital data stored in nickel, and digital data in quartz silica glass, and more.


Why send data in DNA “storage molecules” to space and why is DNA well suited to sending big data for storage in space?

 
 

DNA storage molecules - also known as molecular storage in DNA - uses the structure of a synthetic, non-living DNA molecule to encode data for data storage purposes. Because DNA molecules are extremely tiny and lightweight, and can encode large amounts of data per molecule, and are inexpensive to replicate billions of times, they represent an interesting new medium for backing up and transferring large amounts of big data.


What is the difference between living and nonliving DNA?

 
 

Non-living DNA means a molecule of DNA that does not encode for any living organism and cannot reproduce on its own. An example is synthetic DNA designed in a lab that does not specify the genome for any organism and cannot reproduce on its own.


Can nonliving DNA “storage molecules” reproduce in space or on the Moon?

 
 

No. It can not reproduce at all. The only way to reproduce it is in a laboratory, using DNA replication technology.


Will the Arch Mission ever send DNA for living organisms to space?

 
 

We hope to. In the future we will announce plans to send a library of backup copies of DNA that codes for important organisms, such as the human genome and more. That is a different project from this announcement today. Today we are only announcing that we will send data written into synthetic DNA - to demonstrate the potential of molecular data storage for sending and storing big data sets in space as cost-effectively as possible.


Where will the DNA data storage molecules be stored?

 
 

The DNA storage molecule data set will be inside thin sheets of encapsulation material (to protect and contain it), and these will be inserted between layers of nickel that contain analog and digital data. All of this will be enclosed in a double layer metallic box with a vacuum between the two layers of the shell (to insulate the inside of the box from heat). The content of the payload will be protected from dust, most radiation, heat, chemical exposure (from rocket fuel), and micrometeorites (to the extent possible).


How is data written to DNA?

 
 

Starting with binary data, algorithms based on telecommunications and IT technologies are used to transform this into the language of life, A,C,G and T. The basic idea is to change from a base 2 system (choice of 2 - 0 or 1) to a base 4 system (choice of 4 at each location - A, C, G, T).  There is an inherent efficiency here in the increased choice available per position. The DNA sequence that results encodes the digital information, and is also amenable to standard biological processing like copying and reading, allowing the data to be maintained and read efficiently.

For example, let's say for simplicity’s sake, A=00; C=01; G=10; T=11; Then to encode the digital file 010011110010 into DNA would be CATTAG. That sequence (CATTAG) would then be synthesized by Twist. The reality is that the algorithms for encoding digital data are much more complex than this, but the example is still relevant.


How is data read out from synthetic DNA exactly?

 
 

The data is "read" by first sequencing the DNA - this is a common practice today, with sequencing happening on standard machines. So you would take the piece of DNA described above and by sequencing it, you would find that the sequence is CATTAG. Then, you would put that sequence back into the algorithm to convert it back into digital data (the reverse of encoding).


What is the encoding scheme? How hard is it to decode? Can anyone figure it out or do they need a key?

 
 

This is a special algorithm that is specific to the encoding. It is not simplistic, as we have described above, and does require the specialized algorithm used to encode it.


How will a recipient in the future know the DNA contains data?

 
 

The Arch Mission payloads provide visual indicators and visual microscopic instructions about what they contain, so that future recipients can figure them out. They also will contain microscopic analog image instructions for locating, extracting and decoding the data written onto our collection of synthetic DNA storage molecules.


Can I include my data in Lunar Library DNA Special Collection?

 
 

Yes! The Arch Mission Foundation, in partnership with Microsoft and the Molecular Information Systems Laboratory at the University of Washington in Seattle, will take a selection of images we receive and encode them in DNA to be archived on the surface of the Moon!

You can submit and original photo to the #MemoriesInDNA Project to support the development of a next-generation storage and retrieval system for digital data based on nature’s own perfected data storage system — DNA!