Statistical Models in Biology

STATS215 · Winter 2020 · Stanford University

Course Description

This course is about probabilistic models in biology and the statistical inference algorithms necessary to fit them to data. We will cover some of the most important tools for modeling biological data, including latent variable models, hidden Markov models, dynamical systems, Poisson processes, and recent extensions like variational autoencoders and recurrent neural networks. We will study algorithms for parameter estimation and inference, and we will apply these tools to a variety of problems across biology, with a particular emphasis on applications in neuroscience. In your homework assignments and final project, you will implement these models and algorithms and apply them to real data.

Prerequisites

You should be comfortable with the basics of probability at the level of STAT116, including random variables, joint densities, conditional distributions, etc. You should also be familiar with linear regression and maximum likelihood estimation, for example from STAT200. We will use basic linear algebra (solving linear systems and using eigendecompositions) and multivariable calculus (gradients, Jacobians, Hessians, etc.).

Logistics

Instructor: Scott Linderman
Teaching Assistant: Kevin Han
Lectures: Tuesday and Thursday, 12-1:20pm
Location: 540-108
Office Hours:
- Scott: Tuesday 2-3pm Sequoia Hall Rm 232
- Kevin: Monday 3-5pm Sequoia Hall Library

Grading

4 Assignments: 15% each
Midterm Exam: 10%
Final Project Proposal: 5%
Final Project Report: 20%
In-class Participation: 5%

Late policy: All assignments are due at 11:59pm PT on the due date. You can turn in homework assignments up to a week late with a 50% penalty on your grade. Final project reports will not be accepted after the deadline.

Midterm: The midterm will be given in class.

Assignments

Assignments consist of math problems and coding problems. You will find a LaTeX template for your write-up in the GitHub folder. Submit a pdf of your write-up and an .ipynb and .pdf of the Google Colab notebook with your completed coding assignment. All submissions should be made through Canvas.

Final project

The final project is an opportunity to apply these models and algorithms to datasets of interest to you. Ideally, your project will involve some theory – extending a model to fit the needs of your particular problem and studying its properties – and some experimentation – fitting the model to biological data.

Proposal: Start thinking about what datasets you'd like to study and what questions you'd like to answer early! These will inform your choices about modeling tools. As a forcing function, part of your grade will be based on your proposal, which is due roughly a month before the final report. By this point, you should have a pretty clear idea about the dataset and question, and some initial thoughts about the types of models you will explore and experiments you will run.

Report: The final report will present your theoretical work and experimental results. It should look like the start of a research paper. To that end, you will write it in the NeurIPS paper format, and you will submit a link to a GitHub repository with supporting code.

Readings

The readings are meant to supplement lecture with further details and show examples of how different models are being used in biology. I've listed multiple technical readings and applications – you do not have to read all of them. You may find it helpful to see the same concepts presented in different ways, or you may find that you like some author's style better than others. Likewise, I've listed multiple references for many biological applications; I will reference some of these in class, and if you're curious you can dig into the others as well.

Anticipated Schedule

Lecture	Date	Topic	Technical Readings	Applications
1	Jan 7	Probability Review	Bishop (2006) Ch 2 Murphy (2013) Ch 2
2	Jan 9	Graphical Models, Learning, and Inference	Bishop (2006) Ch 8 Barber (2010) Ch 3 and 4 Wainwright and Jordan (2008) Ch 2	Gene Interaction Networks: Friedman et al (2000)
3	Jan 14	Bayesian Linear Regression	Bishop (2006) Ch 3 and 4 Hastie et al (2009) Ch 3 and 4 Murphy (2013) Ch 7-9	GWAS: Visscher et al (2017) · Hilary Finucane's Notes · Kang et al (2010) · Lippert et al (2011) and Supp
4	Jan 16	Logistic Regression HW1 Out	Goodfellow et al (2016) Ch 6, 8, 9	Predicting retinal responses: Pillow et al (2008) · McIntosh et al (2016) · Batty et al (2017)
5	Jan 21	Generalized Linear Models and Exp. Families	Murphy (2013) Ch 9 demo notebook	Predicting retinal responses: Pillow et al (2008) · McIntosh et al (2016) · Batty et al (2017)
6	Jan 23	Latent Variable Models: Mixtures, Factors, and EM Part I	Murphy (2013) Ch 11 and 12 Bishop Ch 9 and 12	Spike sorting: Pachitariu et al (2016) · Liam Paninski's Notes
7	Jan 28	Latent Variable Models: Mixtures, Factors, and EM Part II	Murphy (2013) Ch 11 and 12 Bishop Ch 9 and 12	Receptive fields: Liu et al (2017) Finding motifs: Mackevicius et al (2019)
8	Jan 30	Latent Variable Models: Mixtures, Factors, and EM Part III HW1 Due · HW2 Out	Murphy (2013) Ch 11 and 12 Bishop Ch 9 and 12 demo notebook	Pose estimation with missing data: Markowitz et al (2018)
9	Feb 4	Variational Inference and Nonlinear LVMs: Part I	Blei et al (2017)	Network Models: Gopalan and Blei (2013) · Linderman et al (2016)
—	Feb 6	Midterm Exam
10	Feb 11	Midterm Review
11	Feb 13	Variational Inference and Nonlinear LVMs: Part II	Kingma and Welling (2019) Ch 2 demo notebook	Single cell RNAseq: Lopez et al (2018) and blog · Grønbech et al (2018)
12	Feb 18	Hidden Markov Models HW2 Due · HW3 Out	Bishop (2006) Ch 13 Barber (2010) Ch 23 Murphy (2013) Ch 17	Calcium deconvolution: Friederich et al (2017) · Jewell et al (2018, 2019)
13	Feb 20	Linear Gaussian Dynamical Systems Project Proposal Due	Barber (2010) Ch 24 Murphy (2013) Ch 18 demo notebook	Neural state spaces: Paninski et al (2010) · Macke et al (2011)
14	Feb 25	Switching Linear Dynamical Systems	Barber (2010) Ch 25 Linderman et al (2017)	Postural dynamics: Wiltschko et al (2015) Neural circuit dynamics: Linderman et al (2019) · Taghia et al (2018)
15	Feb 27	Gaussian Processes	Rasmussen and Williams (2006) Ch 2 Hensman et al (2013) Titsias and Lawerence (2010) Wang et al (2019)	Odor representation in cortex: Wu et al (2017) and 2018
16	Mar 3	Guest Lecture: Matt Johnson Structured, Sequential VAE's HW3 Due · HW4 Out	Johnson et al (2016)	Nonlinear embedding of neural activity: Gao et al (2016) · Pandarinath et al (2018)
17	Mar 5	Poisson Processes	Kingman (1993) Ch 1 and 2 Uri Eden's Notes	Neural firing rates: Brown et al 2002 · Truccolo et al (2005) · Cunningham et al (2008a and 2008b) · Loaiza-Ganem et al (2019)
18	Mar 10	Continuous Time Markov Chains	Rao and Teh (2013)	Complex synapses: Lahiri and Ganguli (2013) and Supp
19	Mar 12	Hawkes and Cox Processes HW4 Due	Hawkes (1971) Kingman (1993) Ch 6	Social contagion: Linderman and Adams (2014)
—	Mar 20	Final Report Due