Deep learning and information theory: An Emerging Interface

Tutorial given at International Symposium on Information Theory, ISIT 2018

Slides available here Video recording available

Abstract

Modern deep learning has brought forth many discoveries across multiple disciplines: computer vision, speech recognition, natural language processing technology and the ability to learn games purely by self-play. Much of this is powered by the ability to acquire large amounts of data as well as the appropriate inductive bias of deep learning matched to the problem domain. In this tutorial we will explore the interplay of this emerging technology with information theory. In particular, we will cover two themes.

(1) Applications of Deep Learning to Information Theory: The information theory community has spearheaded the several breakthroughs in code design and decoding algorithms that have revolutionized modern digital communication. In this theme, we examine whether it is possible to utilize modern deep learning technology in order to accelerate the discovery of such coding schemes. We will cover various developments in this area, showing that Viterbi and BCJR algorithms can be ”learnt” from observed data, as well as how algorithms better than message passing can be learnt for high density codes. Furthermore, the well-studied setting of channel coding, where we can essentially obtain unlimited amount of training data, and where near-optimal coding strategies are already known in several settings, can provide a lens through which one can improve and enhance present deep learning technology. Beyond code design, deep learning, when viewed as a general purpose function approximator has the potential to be more widely applicable in information theory. We will touch upon this general idea. Indeed, some recent works have utilized deep learning for (conditional) independence testing, mutual information estimation, compressed sensing as well as for false-discovery-rate control in multiple hypothesis testing.

(2) Information theoretic principles for deep learning In the second theme, we will provide a birdseye-view survey on the utility of information theoretic principles in understanding and designing deep learning systems. There are three categories in which such works can be broadly classified: (a) representation and (b) learnability. (A) Indeed, a basic result in deep learning is the ability to closely approximate any continuous function. There are several modern generalizations of representation theorems understanding the number and depth of such networks required to approximate various function classes, as well as some invariance properties. We will survey these results. (B) There are emerging works, including tensor methods, that provide various learnability guarantees for neural-networks and mixtures-of-experts under some mathematical assumptions.

We will survey this area highlighting how the non-convexity barrier can be bypassed. As another example, we have shown how to avoid mode-collapse when training Generative-adversarial networks by applying Blackwells’ theorem on hypothesis testing. Generalization bounds guarantee performance beyond the observed data, and there are some bounds which include an informationtheoretic characterization.

Bio of Speakers

Sreeram Kannan is currently an assistant professor at University of Washington, Seattle. He was a postdoctoral scholar at University of California, Berkeley between 2012-2014 before which he received his Ph.D. in Electrical and Computer Engineering and M.S. in mathematics from the University of Illinois Urbana Champaign. He is a recipient of the 2017 NSF Faculty Early CAREER award, the Van Valkenburg outstanding dissertation award from UIUC, 2013, a co-recipient of the Qualcomm Cognitive Radio Contest first prize, 2010, a recipient of Qualcomm (CTO) Roberto Padovani outstanding intern award, 2010, a recipient of the gold medal from the Indian Institute of Science, 2008, and a co-recipient of Intel India Student Research Contest first prize, 2006. His research interests are in information theory and machine learning and their applications in communications and computational biology.

Hyeji Kim is a postdoctoral research associate with the Coordinated Science Laboratory at University of Illinois at Urbana-Champaign. She received her Ph.D. and M.S. degrees in Electrical Engineering from Stanford University in 2016 and 2013, respectively, and her B.S. degree with honors in Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST) in 2011. Her research interests include information theory, machine learning, and wireless communications. She is a recipient of Stanford Graduate Fellowship and participated in the Rising Stars in EECS Workshop in 2015.

Sewoong Oh is an Assistant Professor of Industrial and Enterprise Systems Engineering at UIUC. He received his PhD from the department of Electrical Engineering at Stanford University. Following his PhD, he worked as a postdoctoral researcher at Laboratory for Information and Decision Systems (LIDS) at MIT. His research interest is in theoretical machine learning, including spectral methods, ranking, crowdsourcing, estimation of information measures, differential privacy, and generative adversarial networks. He was co-awarded the best paper award at the SIGMETRICS in 2015, and awarded the NSF CAREER award in 2016, SIGMETRICS rising star award in 2017, and GOOGLE Faculty Research Award.