Research: Sreeram Kannan

My research interests are primarily centered around information theory and its applications to computational biology and wireless networks. The following links will take you to the appropriate subsections.

Computational Biology

Fundamental Limits and Optimal Algorithms for Transcriptome Assembly

Transcriptome Assembly 

High throughput sequencing of RNA transcripts (called RNA-seq) has emerged in recent years as a powerful method that enables discovery of novel transcripts and alternatively spliced isoforms of genes, along with accurate estimates of gene expression. A key step in RNA-seq is assembly of transcripts from short reads obtained using shotgun sequencing. While there are some existing assemblers, they do not always agree on the reconstructed transcripts even when the number of reads is large. Therefore we raise the following fundamental question: when is there enough information to do accurate transcriptome reconstruction? We establish the information theoretic limits on this assembly problem in the limit of large number of reads. We also propose a new algorithm for transcriptome recovery that has near optimal performance, and yet has low complexity. One aspect of the proposed algorithm is derived from a new method we propose for decomposing a flow into the fewest number of paths. In this ongoing project, we are in the process of building a software system capable of high performance transcriptome assembly. Some open theoretical questions we are investigating include the role of a reference genome, a reference transcriptome, and the utility of long reads.

S. Kannan, L. Pachter and D. Tse, Fundamental Limits and Optimal Algorithms for Transcriptome Assembly, Allerton 2013.

Fundamental Limits and Optimal Algorithms for DNA Variant Calling

DNA Variant Calling 

DNA variant calling is a basic problem in computational biology:one would like to know how different is a target DNA (for example, a particular individuals DNA) from a known reference DNA (for example, a reference human genome). The standard approach is to perform shotgun sequencing on the target DNA, i.e., to obtain a set of short reads obtained as substrings from the target DNA sequence. The goal of the reconstruction algorithm is to characterize the variations in the target DNA from the reference DNA using the short reads. While there are different algorithms for calling different types of variation, for example, substitution, insertion/deletion (indel), structural variant, there is no theory nor a guiding principle behind when these algorithms are optimal. In this project, we seek to establish the fundamental limits of variant calling and propose new algorithms that can approach these limits. We also quantify the benefits of having the side information in the form of a reference DNA. In this ongoing project, we are in the process of building a reference based genome assembler.

S. Kannan, S. Mohajer and D. Tse, Fundamental Limits of DNA Variant Calling, Allerton 2014.

Causal Learning


While understanding correlation between variables gives predictive power, understanding causal connections between variables gives power to intervene and perturb the system. In order to do this, the time series of each variable, i.e., the corresponding random process, can provide valuable information. In this project, we explore the reverse engineering of causality networks using observed time series of variables. While this problem is well studied if the time-series are real valued, for example, Granger causality provides a good metric, there is very little work for the case of discrete variables. We propose a new method that can reverse engineer the causal network between boolean random processes under a certain model assumption, inspired from neuroscience. We are currently exploring the fundamentals of causal learning and its applications in computational biology.

In collaboration with Thomas Zheng and Victor Chan, Qualcomm Research.

S. Kannan and T. Zheng, A Method for Inferring Causal Dependencies between Random Processes, U.S. Patent pending

Wireless Networks

Capacity of Wireless Interference Networks

Wireless Network Capacity 

The density of wireless deployment is steadily increasing and interference between wireless devices will place a key bottleneck on communication networks. The current solution of orthogonal resource allocation will not scale well in this limit. In this project, we study the capacity of a network of nodes sharing a wireless medium, with several soures intending to communicate to several destinations. We show a general meta-theorem, which we prove for a variety of settings: If there is a "good" scheme for a channel, which is approximately cut-set acheiving and reciprocal, then for a network of such channels, local coding plus global routing is optimal to within a logarithmic factor in the number of messages. This information theoretic result establishes also that a simple layered architecture for communication is optimal, if it is carefully designed with the correct boundaries for the layers. Key technical contributions include a formal connection of the network information theory problem to the theory of approximation algorithms and the ruling out of more complex schemes in order to achieve the capacity to within the approximation factor.

S. Kannan and P. Viswanath, Capacity of multiple unicasts in wireless networks: A polymatroidal approach, IEEE Trans. Info. Theory, final revision.

S. Kamath, S. Kannan, and P. Viswanath, Wireless Networks with Symmetric Demands , IEEE ISIT, journal version in submission.

Approximation Algorithms for Polymatroidal Networks

Polymatroidal Network 

Networks are normally modelled to have edge or node capacities that constrain communication rate on a given edge or a node. However, several real world networks, including wireless and transportation networks, may display more complicated coupling of capacity constraints between edges and nodes. In this project, we "rediscover" a polymatroidal network model, where the capacity of edges that meet at a node, are jointly constrained by a submodular function. We show that for this model, the classical flow-cut gap results for the multicommodity problem can be generalized to this model. This model captures existing results on node and edge capacitated networks, and also single-commodity results for polymatroidal networks, which are originally derived using differing methods inside a single conceptual umbrella. The key technical contribution is the use of convex relaxation of the combinatorial cut problem using the Lovasz extension and a rounding scheme for the relaxed cut problem using line embeddings. Furthermore, we demonstrate applications of this result in deducing the capacity of wireless networks.

C. Chekuri, S. Kannan, A. Raja, and P. Viswanath, Multicommodity flows and cuts in polymatroidal networks, SIAM Journal on Computing, under review.

Capacity of Broadcast Networks with Relaying and Cooperation

Broadcast Network 

Relaying is a key feature proposed in the next generation of cellular networks in order to increase range (last mile coverage), increase capacity and to decrease network outage. When adding dedicated relays may prove to be expensive, existing users in the network can help relay information to other users (called user cooperation). While recent work has provided an approximate characterization of the information theoeretic capacity of relay networks, where a single source communicates to its destination through a network of relays, the characterization of capacity of broadcast networks remains open. In this project, we find the approximate capacity of downlink cellular networks with user cooperation and relaying. While it is hard to analyze the capacity of this networks directly, we create an operational connection to a determinisitic network, for which it is simpler to design provably near-optimal techniques. The proposed scheme has an interesting interpretation: the relays do random coding, and the destination fixes a decoding map, and then the source constructs a special codebook based on the realization of the codebooks used at the relays.

S. Kannan, A. Raja, and P. Viswanath, Approximately Optimal Wireless Broadcasting, IEEE Trans. Info Theory, Dec. 2012.

Interactive Interference Alignment

Interactive Interference Network 

Interference is a key bottleneck in modern wireless networks, and studying schemes that can manage interference well is an important topic of research. In this context, interference alignment has emerged as an important scheme with several attractive properties, however, the scheme is currently far from being practical. In this project, we study interference channels with the added feature of interaction, i.e., the destinations can talk back to the sources. The interaction can come in two ways: 1) for half-duplex radios, destinations can talk back to sources using either simultaneous out-of-band (white spaces) transmission or in-band half-duplex transmission; 2) for full-duplex radios, both sources and destinations can transmit and listen in the same channel simultaneously. The flexibility afforded by interaction among sources and destinations allows for the derivation of interference alignment (IA) strategies that have desirable “engineering properties”: insensitivity to the rationality or irrationality of channel parameters, small block lengths and finite SNR operations. We show that for "small" interference channels the interactive interference alignment scheme can achieve the optimal degrees of freedom.

Q. Geng, S. Kannan, and P. Viswanath, Interactive Interference Alignment, IEEE JSAC, in submission.

Distributed Function Computation

Function Computation 

In several scenarios, we are interested in computing some function of the data storied in different locations connected by a common communication infrastructure. This occurs, for example, in sensor networks where we want to compute a function of the sensor data or in cloud computing where data is stored on disks using a distributed storage system. In these contexts where communication is a key bottleneck, we are interested in the question of the capacity of function computation. In this project, we show that for many classes of functions, including symmetric functions, a simple scheme involving computation trees is near optimal if all the edges in the graph have bidirectional communication infrastructure.

S. Kannan and P. Viswanath, Multi-session Function Computation in Undirected Graphs, JSAC, April 2013.

Distributed Computing with Noise: Tree Codes

Tree Coding! 

With increasing miniaturization and voltage scaling, the buses connecting multiple distributed computing cores become ever more noisy. In order for the cores to execute their computation objective reliably, some form of error correction coding is mandatory. Error correction becomes complicated by the fact that bits cannot be blocked up into large chunks without first learning the output from the other cores. Therefore, a certain form of causal coding called tree coding has been proposed in the literature to deal with this problem. While existence of good tree codes have been known, how to construct one efficiently has not been well understood. In this project, we propose a new polynomial time construction of tree codes that succeeds with inverse quasi-polynomial probability.

Y. Kalai, S. Kannan, and M. Sudan, A Randomized Construction of Tree Codes, in preparation.