Selected Publications

Next generation sequencing machines produce vast amounts of genomic data. For the data to be useful, it is essential that it can be stored and manipulated efficiently. This work responds to the combined challenge of compressing genomic data, while providing fast access to regions of interest, without necessitating decompression of whole files.
Bioinformatics 2016,2016

A general tree of n nodes can be represented in pointer form, requiring O(nlogn) bits, whereas the succinct representations we studied require just 2n+o(n) bits and carry out many sophisticated operations in constant time. Yet, there is no exhaustive study in the literature comparing the practical magnitudes of the o(n)-space and the O(1)-time terms.
In ALENEX’10, 2010

Some Publications

. LZ78 Compression in Low Main Memory Space. SPIRE, 2017.

PDF Code Project

. Full Compressed Affix Tree Representations. In Proc. DCC’17, 2017.

PDF Code Project

. CSAM: Compressed SAM format. Bioinformatics 2016, 2016.

PDF Code Project

. Practical compressed string dictionaries. Information Systems, 2016.

PDF

. Shortest DNA cyclic cover in compressed space. In Proc. DCC’16, 2016.

PDF

. Lossy compression of quality scores in genomic data. Bioinformatics 2014, 2014.

PDF Code Project

. Practical compressed suffix trees. Algorithms, 2013.

PDF Code Project

. Practical compression for multi-alignment genomic files. In Proc. Thirty-Sixth Australasian Computer Science Conference, 2013.

PDF

. Compression of RDF dictionaries. In Proc. 27th Annual ACM Symposium on Applied ComputingACM SIGAPP Applied Computing Review, 2012.

PDF

. Succinct trees in practice. In ALENEX’10, 2010.

PDF

Courses

Courses taken in Udemy and StackSkills

Projects

Affix Trees

Generalization of the Suffix Tree that supports full tree functionalities in both search directions

COvI

Compressed Overlap Index

CSAM

Compressed SAM format

Compressed Suffix Tree

A Compressed Suffix Tree implementation

TMPred

TMPred