stats215
stats215 copied to clipboard
Statistical Models in Biology
STATS215 · Winter 2020 · Stanford University
Course Description
This course is about probabilistic models in biology and the statistical inference algorithms necessary to fit them to data. We will cover some of the most important tools for modeling biological data, including latent variable models, hidden Markov models, dynamical systems, Poisson processes, and recent extensions like variational autoencoders and recurrent neural networks. We will study algorithms for parameter estimation and inference, and we will apply these tools to a variety of problems across biology, with a particular emphasis on applications in neuroscience. In your homework assignments and final project, you will implement these models and algorithms and apply them to real data.
Prerequisites
You should be comfortable with the basics of probability at the level of STAT116, including random variables, joint densities, conditional distributions, etc. You should also be familiar with linear regression and maximum likelihood estimation, for example from STAT200. We will use basic linear algebra (solving linear systems and using eigendecompositions) and multivariable calculus (gradients, Jacobians, Hessians, etc.).
Logistics
- Instructor: Scott Linderman
- Teaching Assistant: Kevin Han
- Lectures: Tuesday and Thursday, 12-1:20pm
- Location: 540-108
- Office Hours:
- Scott: Tuesday 2-3pm Sequoia Hall Rm 232
- Kevin: Monday 3-5pm Sequoia Hall Library
Grading
- 4 Assignments: 15% each
- Midterm Exam: 10%
- Final Project Proposal: 5%
- Final Project Report: 20%
- In-class Participation: 5%
Late policy: All assignments are due at 11:59pm PT on the due date. You can turn in homework assignments up to a week late with a 50% penalty on your grade. Final project reports will not be accepted after the deadline.
Midterm: The midterm will be given in class.
Assignments
Assignments consist of math problems and coding problems. You will find a LaTeX template for your write-up in the GitHub folder. Submit a pdf of your write-up and an .ipynb and .pdf of the Google Colab notebook with your completed coding assignment. All submissions should be made through Canvas.
Final project
The final project is an opportunity to apply these models and algorithms to datasets of interest to you. Ideally, your project will involve some theory – extending a model to fit the needs of your particular problem and studying its properties – and some experimentation – fitting the model to biological data.
Proposal: Start thinking about what datasets you'd like to study and what questions you'd like to answer early! These will inform your choices about modeling tools. As a forcing function, part of your grade will be based on your proposal, which is due roughly a month before the final report. By this point, you should have a pretty clear idea about the dataset and question, and some initial thoughts about the types of models you will explore and experiments you will run.
Report: The final report will present your theoretical work and experimental results. It should look like the start of a research paper. To that end, you will write it in the NeurIPS paper format, and you will submit a link to a GitHub repository with supporting code.
Readings
The readings are meant to supplement lecture with further details and show examples of how different models are being used in biology. I've listed multiple technical readings and applications – you do not have to read all of them. You may find it helpful to see the same concepts presented in different ways, or you may find that you like some author's style better than others. Likewise, I've listed multiple references for many biological applications; I will reference some of these in class, and if you're curious you can dig into the others as well.
Anticipated Schedule
| Lecture | Date | Topic | Technical Readings | Applications |
|---|---|---|---|---|
| 1 | Jan 7 | Probability Review | Bishop (2006) Ch 2 Murphy (2013) Ch 2 |
|
| 2 | Jan 9 | Graphical Models, Learning, and Inference | Bishop (2006) Ch 8 Barber (2010) Ch 3 and 4 Wainwright and Jordan (2008) Ch 2 |
Gene Interaction Networks: Friedman et al (2000) |
| 3 | Jan 14 | Bayesian Linear Regression | Bishop (2006) Ch 3 and 4 Hastie et al (2009) Ch 3 and 4 Murphy (2013) Ch 7-9 |
GWAS: Visscher et al (2017) · Hilary Finucane's Notes · Kang et al (2010) · Lippert et al (2011) and Supp |
| 4 | Jan 16 | Logistic Regression HW1 Out |
Goodfellow et al (2016) Ch 6, 8, 9 | Predicting retinal responses: Pillow et al (2008) · McIntosh et al (2016) · Batty et al (2017) |
| 5 | Jan 21 | Generalized Linear Models and Exp. Families | Murphy (2013) Ch 9 demo notebook |
Predicting retinal responses: Pillow et al (2008) · McIntosh et al (2016) · Batty et al (2017) |
| 6 | Jan 23 | Latent Variable Models: Mixtures, Factors, and EM Part I | Murphy (2013) Ch 11 and 12 Bishop Ch 9 and 12 |
Spike sorting: Pachitariu et al (2016) · Liam Paninski's Notes |
| 7 | Jan 28 | Latent Variable Models: Mixtures, Factors, and EM Part II | Murphy (2013) Ch 11 and 12 Bishop Ch 9 and 12 |
Receptive fields: Liu et al (2017) Finding motifs: Mackevicius et al (2019) |
| 8 | Jan 30 | Latent Variable Models: Mixtures, Factors, and EM Part III HW1 Due · HW2 Out |
Murphy (2013) Ch 11 and 12 Bishop Ch 9 and 12 demo notebook |
Pose estimation with missing data: Markowitz et al (2018) |
| 9 | Feb 4 | Variational Inference and Nonlinear LVMs: Part I | Blei et al (2017) | Network Models: Gopalan and Blei (2013) · Linderman et al (2016) |
| — | Feb 6 | Midterm Exam | ||
| 10 | Feb 11 | Midterm Review | ||
| 11 | Feb 13 | Variational Inference and Nonlinear LVMs: Part II | Kingma and Welling (2019) Ch 2 demo notebook |
Single cell RNAseq: Lopez et al (2018) and blog · Grønbech et al (2018) |
| 12 | Feb 18 | Hidden Markov Models HW2 Due · HW3 Out |
Bishop (2006) Ch 13 Barber (2010) Ch 23 Murphy (2013) Ch 17 |
Calcium deconvolution: Friederich et al (2017) · Jewell et al (2018, 2019) |
| 13 | Feb 20 | Linear Gaussian Dynamical Systems Project Proposal Due |
Barber (2010) Ch 24 Murphy (2013) Ch 18 demo notebook |
Neural state spaces: Paninski et al (2010) · Macke et al (2011) |
| 14 | Feb 25 | Switching Linear Dynamical Systems | Barber (2010) Ch 25 Linderman et al (2017) |
Postural dynamics: Wiltschko et al (2015) Neural circuit dynamics: Linderman et al (2019) · Taghia et al (2018) |
| 15 | Feb 27 | Gaussian Processes | Rasmussen and Williams (2006) Ch 2 Hensman et al (2013) Titsias and Lawerence (2010) Wang et al (2019) |
Odor representation in cortex: Wu et al (2017) and 2018 |
| 16 | Mar 3 | Guest Lecture: Matt Johnson Structured, Sequential VAE's HW3 Due · HW4 Out |
Johnson et al (2016) | Nonlinear embedding of neural activity: Gao et al (2016) · Pandarinath et al (2018) |
| 17 | Mar 5 | Poisson Processes | Kingman (1993) Ch 1 and 2 Uri Eden's Notes |
Neural firing rates: Brown et al 2002 · Truccolo et al (2005) · Cunningham et al (2008a and 2008b) · Loaiza-Ganem et al (2019) |
| 18 | Mar 10 | Continuous Time Markov Chains | Rao and Teh (2013) | Complex synapses: Lahiri and Ganguli (2013) and Supp |
| 19 | Mar 12 | Hawkes and Cox Processes HW4 Due |
Hawkes (1971) Kingman (1993) Ch 6 |
Social contagion: Linderman and Adams (2014) |
| — | Mar 20 | Final Report Due |
Textbooks
Barber, D. (2012). Bayesian reasoning and machine learning. Cambridge University Press.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Ewens, W. J., & Grant, G. (2005). Statistical Methods in Bioinformatics. Springer.
Available in library
Goodfellow, I., Bengio, Y., & Courville, A. (2016). The deep learning book. MIT Press.
Kingman, J. F. C. (1993). Poisson processes. Clarendon Press.
Available in library
Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.
Available in library
Williams, C. K., & Rasmussen, C. E. (2006). Gaussian processes for machine learning. MIT press.
Robert, C., & Casella, G. (2013). Monte Carlo statistical methods. Springer.