Data storage and compression using CRAM
As our database expands in scale, managing and storing large genomic sequence data has become a challenge, particularly with the large BAM files. As we head towards cheaper sequencing costs we are anticipating a tsunami of data as researchers switch from exome to whole genome sequencing. In preparation, we've taken early steps of further compressing our BAMs using CRAM compression (lossless). Our database system manages its disk space autonomously such that if our allocated disk space reaches a threshold of 80%, it will automatically convert the oldest BAMs to CRAMs and archive them to tape storage, making way for newer datasets. Our testing has shown that the CRAM compression format saves roughly 30% and will provide significant costs savings. Users that wish to have access to the archived BAMs can click a button from our web interface and the system will automatically restore the CRAM from tape and convert them back to BAM.