bcb731
bcb731 copied to clipboard
Defense Against the Dark Arts
BCB 731: Critical readings in biomedical statistics and machine learning
BCB 731 (a.k.a Defense Against the Dark Arts) is a survey of recurring statistical errors and pitfalls which are sometimes used to exaggerate the weight of evidence for novel biological claims or inflate the estimated accuracy of proposed predictive biomedical models. This course focuses on misapplied analyses of data sources where a small number of biological samples are quantified into very high dimensional feature spaces, such as in genomics, proteomics, and biomedical imaging.
Crucially, this is not a course about data falsification or intentional research misconduct. Our focus is the hazy space in which good intentions meet flawed incentives, motivated reasoning, and high dimensional data.
Fall 2023 Schedule
Links
REPRODUCIBILITY CRISIS
- The Center for Open Science
- Investigating the replicability of preclinical cancer biology
- The preregistration revolution
- Why Hypothesis Testers Should Spend Less Time Testing Hypotheses
- Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results
- The Connection Between Varying Treatment Effects and the Crisis of Unreplicable Research: A Bayesian Perspective
- Why Most Published Research Findings Are False
- Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis
- Observing Many Researchers Using the Same Data and Hypothesis Reveals a Hidden Universe of Uncertainty
- Same data, different analysts: variation in effect sizes due to analytical decisions in ecology and evolutionary biology
P-HACKING (AND RELATED COMMON DISASTERS IN STATISTICAL HYPOTHESIS TESTING)
- P-Hacking and the Problem of Multiple Comparisons
- The Extent and Consequences of P-Hacking in Science
- Gazing into the Abyss of P-Hacking: HARKing vs. Optional Stopping
- The problem with p-hacking is not the “hacking,” it’s the “p” (or, Fisher is just fine on this one)
- Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment
- The Myriad Forms of p-Hacking
RESEARCH SCANDALS
- A top Cornell food researcher has had 15 studies retracted. That’s a lot.
- How Shoddy Statistics Found A Home In Sports Research
- When the Revolution Came for Amy Cuddy
- Stanford president resigns over manipulated research, will retract at least three papers
- How Bright Promise in Cancer Testing Fell Apart
- Nobel Prize winner Gregg Semenza retracts four papers
- Duke U to Pay $112 Million in Fake Data Scandal
- After honesty researcher's retractions, colleagues expand scrutiny of her work
- They studied dishonesty. Was their work a lie?
OTHER CLASSES
EARLY 20TH CENTURY STATISTICS
- Historical Origins of Statistical Testing Practices: The Treatment of Fisher versus Neyman-Pearson Views in Textbooks
- The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?
- Karl Pearson: An Appreciation of Some Aspects of His Life and Work"
EXPLORATORY DATA ANALYSIS
- John Tukey's Exploratory Data Analysis: Past, Present, and Future
- 2010 review of Exploratory Data Analysis
- 2009 review of Exploratory Data Analysis
- Statistical inference for exploratory data analysis and model diagnostics
STATS/ML
- Generative Models: An Interdisciplinary Perspective
- Inference in the age of big data: Future perspectives on neuroscience
- Statistical Learning Theory: Models, Concepts, and Results
STATS/ML BOOKS
- Computer Age Statistical Inference
- Elements of Statistical Learning
- Advanced Data Analysis from an Elementary Point of View
- Pattern Recognition and Machine Learning
- The Nature of Statistical Learning Theory
- Modern Statistics for Modern Biology
MODEL OVERFITTING / INTERPOLATIVE MEMORIZATION (AKA DOUBLE DESCENT)
- Lecture on Bias-Variance Trade-off in Machine Learning
- Model Selection: Underfitting, Overfitting, and the Bias-Variance Tradeoff
- Reconciling modern machine learning practice and the bias-variance trade-off
- Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models
- There is no Double-Descent in Random Forests
- No Double Descent in PCA: Training and Pre-Training in High Dimensions
- Unifying Grokking and Double Descent
- Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle
CAUSAL INFERENCE
PRE-16TH CENTURY SCIENCE & PROTO-SCIENCE:
- Archimedes' Sand-Reckoner
- Mathematical Treasures - Zhoubi suanjing
- Aristarchus of Samos and Graeco-Babylonian Astronomy
- Pliny's Natural History
- Aryabhatiya of Aryabhata
- Islamic astronomy & celestial cartography
PRE-MODERN STATISTICS
- Domesday Book
- John Graunt's Natural and Political Observations Made upon the Bills of Mortality
- An argument for divine providence, taken from the constant regularity observ'd in the births of both sexes