Practical compression for multi-alignment genomic files

Rodrigo Canovas, Alistair Moffat

Abstract

Genomic sequence data is being generated in massive quantities, and must be stored in compressed form. Here we examine the combined challenge of storing such data compactly, yet providing bioinformatics researchers with the ability to extract particular regions of interest without needing to fully decompress multi-gigabyte data collections. We focus on data produced in SAM format, which is particularly voluminous in nature, and describe storage techniques that have the desired blend of attributes.

Type

Conference paper

Publication

In Proc. Thirty-Sixth Australasian Computer Science Conference

Date

January, 2013

Links

PDF