BayesianStatisticsCourse
BayesianStatisticsCourse copied to clipboard
PhD-level course at [EMAp](https://emap.fgv.br/en)
Bayesian Statistics
A PhD-level course at EMAp.
To compile the slides, you'll need to do
pdflatex -interaction=nonstopmode --shell-escape bayes_stats
a few times to get it right.
Pre-requisites
- Probability theory with measure. Jeff Rosenthal's book, A First Look at Rigorous Probability Theory, is a good resource.
- Classical Statistics at the same level as Mathematical Statistics. For a review, I suggest Theory of Statistics by Mark Schervish.
Books
- The Bayesian Choice (BC) by Christian Robert will be our main source.
- A first course in Bayesian statistical methods (FC) by Peter Hoff is a good all-purpose introduction.
- Theory of Statistics (SV) by Mark Schervish is a comprehensive reference.
- Bayesian Theory (BT) by José Bernardo and Adrian Smith is a technical behemoth, suitable for use as a reference guide.
Resources
- An overview of computing techniques for Bayesian inference can be found here.
- See Esteves, Stern and Izbicki's course notes.
- Rafael Stern's excellent course.
- Principles of Uncertainty by the inimitable J. Kadane is a book about avoiding being a sure loser. See this review by Christian Robert.
- Bayesian Learning by Mattias Vilani is a book for a computation-minded audience.
- Michael Betancourt's website is a treasure trove of rigorous, modern and insightful applied Bayesian statistics. See this as a gateway drug.
- Awesome Bayes is a curated list of bayesian resources, including blog posts and podcasts.
Acknowledgements
Guido Moreira and Isaque Pim suggested topics, exercises and exam questions. Lucas Moschen made many good suggestions.
Exercises
We keep a list here. I recommend you check back every so often because this is likely to be updated (if infrequently).
News
- Papers for the assignment are here. A bib file is also made available. Turn in your choice by 18h (Brasília time) on 2024-06-19.
- Discussion guide is now available. Hand-in deadlineis 2024-07-05 at 16h Brasília time.
Syllabus
Lecture 0: Overview
- The LaplacesDemon introductory vignette gives a very nice overview of Bayesian Statistics.
- What is a statistical model? by Peter McCullagh gives a good explanation of what a statistical model is. See also BC Ch1.
- There are a few Interpretations of Probability and its important to understand them so the various schools of statistical inference make sense.
- WHAT IS BAYESIAN/FREQUENTIST INFERENCE? by Larry Wasserman is a must read in order to understand what makes each inference paradigm tick.
- This cross-validated post has a very nice, measure-theoretic proof of Bayes's theorem.
Lecture 1: Principled Inference, decision-theoretical foundations
- Berger and Wolpert's 1988 monograph is the definitive text on the Likelihood Principle (LP).
- See this paper By Franklin and Bambirra for a generalised version of the LP.
- As advanced reading, one can consider Birnbaum (1962) and a helpful review paper published 30 years later by Bjornstad.
- Michael Evans has a few papers on the LP. See Evans, Fraser & Monette (1986) for an argument using a stronger version of CP and Evans, 2013 for a flaw with the original 1962 paper by Birnbaum.
- Deborah G. Mayo challenged Birnbaum's argument on the LP. But Mayo implicitly changed the statement of the SP, nullifing her point. This Cross-Validate post adds more details to the story and to the relevance of the LP.
Lecture 2: Belief functions, coherence, exchangeability
- David Alvarez-Melis and Tamara Broderick were kind enough to provide an English translation of De Finetti's seminal 1930 paper.
- Heath and Sudderth (1976) provide a simpler proof of De Finetti's representation theorem for binary variables.
Lecture 3: Priors I: rationale and construction; conjugate analysis
- The SHeffield ELicitation Framework (SHELF) is a package for rigorous elicitation of probability distributions.
- John Cook provides a nice compendium of conjugate priors by Daniel Fink.
Lecture 4: Priors II: types of priors; implementation
Required reading
- Hidden Dangers of Specifying Noninformative Priors is a must-read for those who wish to understand the counter-intuitive nature of prior measures and their push-forwards.
- The Prior Can Often Only Be Understood in the Context of the Likelihood explains that, from a practical perspective, priors can be seen as regularisation devices and should control what the model does rather than what values the parameter takes.
- Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors shows how to formalise the idea that one should prefer a simpler model by penalising deviations from a base model.
Optional reading
- The Case for Objective Bayesian Analysis is a good read to try and put objective Bayes on solid footing.
Lecture 5: Bayesian point estimation
- The paper The Federalist Papers As a Case Study by Mosteller and Wallace (1984) is a very nice example of capture-recapture models. It is cited in Sharon McGrayne's book "The Theory That Would Not Die" as triumph of Bayesian inference. It is also a serious contender for coolest paper abstract ever.
- This post in the Andrew Gelman blog discusses how to deal with the sample size (
n) in a Bayesian problem: either write out a full model that specifies a probabilistic model fornor write an approximate prior pi(theta|n).
Lecture 6: Bayesian Testing I
Required reading
- In their seminal 1995 paper, Robert Kass and Adrian Raftery give a nice overview of, along with recommendations for, Bayes factors.
Optional reading
- This paper by Christian Robert gives a nice discussion of the Jeffreys-Lindley paradox.
- This paper by Wagenmakers is an excellent historical account of the paradox, and clears many misconceptions.
- Jaynes's 1976 monograph Confidence Intervals vs Bayesian Intervals is a great source of useful discussion. PDF.
Lecture 7: Bayesian Testing II
- This paper by Lavine and Schervish provides a nice "disambiguation" for what Bayes factors can and cannot do inferentially.
- Yao et al. (2018) along with ensuing discussion is a must-read for an understanding of modern prediction-based Bayesian model comparison.
Lecture 8: Asymptotics
- The entry on the Encyclopedia of Mathematics on the Bernstein-von Mises theorem is nicely written.
- The integrated nested Laplace approximation (INLA) methodology leverages Laplace approximations to provide accurate approximations to the posterior in latent Gaussian models, which cover a huge class of models used in applied modelling. This by Thiago G. Martins and others, specially section 2, is a good introduction.
Lecture 9: Applications I
- Ever wondered what to do when both the number of trials and success probability are unknown in a binomial model? Well, this paper by Adrian Raftery has an answer. See also this discussion with JAGS and Stan implementations.
- This case study shows how to create a model from first (physical) principles.
Lecture 10: Applications II
- See Reporting Bayesian Results for a guide on which summaries are indispensable in a Bayesian analysis.
- Visualization in Bayesian workflow is a great paper about making useful graphs for a well-calibrated Bayesian analysis.
Lecture 11: Discussion Bayes vs Frequentism
Disclaimer: everything in this section needs to be read with care so one does not become a zealot!
- See Jayne's monograph above.
- See Frequentism and Bayesianism: A Practical Introduction for a five-part discussion of the Bayesian vs Orthodox statistics.
- Why isn't everyone a Bayesian? is a nice discussion of the trade-offs between paradigms by Bradley Efron.
- Holes in Bayesian statistics is a collection of holes in Bayesian data analysis, such as conditional probability in the quantum real, flat and weak priors, and model checking, written by Andrew Gelman and Yuling Yao.
- Bayesian Estimation with Informative Priors is Indistinguishable from Data Falsification is paper that attempts to draw a connection between strong priors and data falsification. Not for the faint of heart.
Computational resources
- A few pointers from my summer course.
- Darren Wilkinson's blog on parallel tempering. I took the code and applied it to our multimodal Cauchy example.