Logistic-Regression-Assumptions
Logistic-Regression-Assumptions copied to clipboard
Assumptions of Logistic Regression, Clearly Explained
Assumptions of Logistic Regression, Clearly Explained
Understanding and implementing the assumption checks behind one of the most important statistical techniques in data science - Logistic Regression
- Link to TowardsDataScience article: https://towardsdatascience.com/assumptions-of-logistic-regression-clearly-explained-44d85a22b290
- Logistic regression is a highly effective modeling technique that has remained a mainstay in statistics since its development in the 1940s.
- Given its popularity and utility, data practitioners should understand the fundamentals of logistic regression before using it to tackle data and business problems.
- In this project, we explore the key assumptions of logistic regression with theoretical explanations and practical Python implementation of the assumption checks.
Contents
(1) Logistic_Regression_Assumptions.ipynb
- The main notebook containing the Python implementation codes (along with explanations) on how to check for each of the 6 key assumptions in logistic regression
(2) Box-Tidwell-Test-in-R.ipynb
- Notebook containing R code for running Box-Tidwell test (to check for logit linearity assumption)
(3) /data
- Folder containing the public Titanic dataset (train set)
(4) /references
- Folder containing several sets of lecture notes explaining advanced regression
Special Thanks
- @dataninj4 for correcting imports and adding .loc referencing in diagnosis_df cell so that it runs without errors in Python 3.6/3.8
- @ArneTR for rightly pointing out that VIF calculation should include a constant, and correlation matrix should exclude target variable
References
- Machine Learning Essentials - Practical Guide in R
- Logistic and Linear Regression Assumptions - Violation Recognition and Control
- Testing linearity in the logit using Box-Tidwell Transformation in SPSS - Youtube
- Logistic Regression using SPSS
- Statistics How To - Cook's Distance
- Statsmodels Documentation - GLM
- Statsmodels Documetation - Logit Influence example notebook
- PennState Eberly College of Science - Stat 462
- Statistics Solution - Assumptions of Logistic Regression
- Course Notes for IS 6489 - Statistics and Predictive Analytics
- MSc in Big Data Analytics at Carlos III University of Madrid - Notes for Predictive Modeling
- Freakonometrics - Residuals from a Logistic Regression
- Kaggle - Titanic - Logistic Regression with Python
- Yellowbrick API Reference - Cook's Distance
- DataCamp - Understanding Logistic Regression in Python
- Statology - How to Calculate Cook's Distance
- ResearchGate - Box-Tidwell Test in SPSS
- CrossValidated - Why include x ln x interaction term helps
- UCLA IDRE - Logistic Regression Diagnostics
- Logistic and Linear Regression Assumptions: Violation Recognition and Control