ACM-BCB 2016: Tutorial on "Combinatorial methods for nucleic acid sequence analysis"

Description

Segemental Duplication  By deciphering the sequences of genomes, we are able to determine the ‘blueprint’ of how our cells function. Unfortunately while our genomes are polymers of billions of nucleotides, methods for reading sequences are limited to hundreds to thousands of nucleotides. To determine the sequence of a genome, many small fragments of DNA are read, and the genome is inferred through ‘de novo’ fragment assembly, where these short fragments are stitched together to reconstruct the entire genome. In this tutorial, we will discuss information-theoretic barriers and algorithmic methods for reconstructing DNA, and the allied combinatorial problems involved for solving genome structure. In particular, we will discuss the following aspects in detail.
  1. The architecture of human genomes and how this creates challenges for fragment assembly.
  2. The characteristics of high-throughput sequencing data.
  3. Information theoretic conditions for fragment assembly.
  4. Combinatorial methods for de novo fragment assembly, including novel challenges for assembling reads from third-generation long-read sequencers.
  5. Challenges in RNA sequence assembly.

Intended Audience

The intended audience for the tutorial include researchers in both computational biology, as well in algorithmic methods. The tutorial assumes no prior background, and thus can serve as a suitable introduction to this area.

Speaker Bios

Mark Chaisson

Mark Chaisson 

Mark Chaisson has been a postdoctoral scholar in the Eichler lab at University of Washington, Seattle since 2012, where he has been developing methods to detect structural variation and perform de novo genome assembly using single molecule sequencing (SMS). Before joining Dr. Eichler's lab, he spent 3 years at Pacific Biosciences as a senior algorithms engineer developing the BLASR method for mapping SMS reads. He received my Ph.D. in Bioinformatics from the University of California, San Diego in the lab of Dr. Pavel Pevzner, where he developed methods for de novo assembly of the very first 'Next Generation' (Illumina and 454) high-throughput sequencing. Webpage

Sreeram Kannan

Sreeram Kannan  Sreeram Kannan is currently an assistant professor at University of Washington, Seattle since Oct. 2014. He was a postdoctoral scholar at University of California, Berkeley between 2012-2014 before which he received his Ph.D. in Electrical Engineering and M.S. in mathematics from the University of Illinois Urbana Champaign. He is a recipient of the Van Valkenburg outstanding dissertation award from UIUC, 2013, a co-recipient of the Qualcomm Cognitive Radio Contest first prize, 2010, a recipient of Qualcomm (CTO) Roberto Padovani outstanding intern award, 2010, a recipient of the gold medal from the Indian Institute of Science, 2008, and a co-recipient of Intel India Student Research Contest first prize, 2006. His recent research interests include the applications of information theory and learning algorithms to computational biology and networks. Webpage

References

  • Berlin, Konstantin, et al. "Assembling large genomes with single-molecule sequencing and locality-sensitive hashing." Nature biotechnology 33.6 (2015): 623-630.
  • Myers, Gene. "Efficient local alignment discovery amongst noisy long reads."International Workshop on Algorithms in Bioinformatics. Springer Berlin Heidelberg, 2014.
  • Pevzner, Paul A., Haixu Tang, and Glenn Tesler. "De novo repeat classification and fragment assembly." Genome research 14.9 (2004): 1786-1796.
  • Kamath, Govinda M., et al. "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution." bioRxiv (2016): 062117.
  • Steinberg, Karyn Meltz, et al. "Structural diversity and African origin of the 17q21. 31 inversion polymorphism." Nature genetics 44.8 (2012): 872-880.
  • Hastings, P. J., et al. "Mechanisms of change in gene copy number." Nature Reviews Genetics 10.8 (2009): 551-564.
  • Chaisson, Mark J., and Glenn Tesler. "Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory." BMC bioinformatics 13.1 (2012): 238.
  • Chaisson, Mark JP, Richard K. Wilson, and Evan E. Eichler. "Genetic variation and the de novo assembly of human genomes." Nature Reviews Genetics (2015).
  • Kannan, Sreeram, et al. "Shannon: An Information-Optimal de Novo RNA-Seq Assembler." bioRxiv (2016): 039230.
  • Bresler, Guy, Ma'ayan Bresler, and David Tse. "Optimal assembly for high throughput shotgun sequencing." BMC bioinformatics 14.5 (2013): 1.
  • Chen, Yuxin, et al. "Community Recovery in Graphs with Locality." arXiv preprint arXiv:1602.03828 (2016).
  • Shomorony, Ilan, et al. "Information-optimal genome assembly via sparse read-overlap graphs." Bioinformatics 32.17 (2016): i494-i502.