Logistic-Regression-Assumptions icon indicating copy to clipboard operation
Logistic-Regression-Assumptions copied to clipboard

Assumptions of Logistic Regression, Clearly Explained

Assumptions of Logistic Regression, Clearly Explained

Understanding and implementing the assumption checks behind one of the most important statistical techniques in data science - Logistic Regression

  • Link to TowardsDataScience article: https://towardsdatascience.com/assumptions-of-logistic-regression-clearly-explained-44d85a22b290
  • Logistic regression is a highly effective modeling technique that has remained a mainstay in statistics since its development in the 1940s.
  • Given its popularity and utility, data practitioners should understand the fundamentals of logistic regression before using it to tackle data and business problems.
  • In this project, we explore the key assumptions of logistic regression with theoretical explanations and practical Python implementation of the assumption checks.

Contents

(1) Logistic_Regression_Assumptions.ipynb

  • The main notebook containing the Python implementation codes (along with explanations) on how to check for each of the 6 key assumptions in logistic regression

(2) Box-Tidwell-Test-in-R.ipynb

  • Notebook containing R code for running Box-Tidwell test (to check for logit linearity assumption)

(3) /data

  • Folder containing the public Titanic dataset (train set)

(4) /references

  • Folder containing several sets of lecture notes explaining advanced regression

Special Thanks

  • @dataninj4 for correcting imports and adding .loc referencing in diagnosis_df cell so that it runs without errors in Python 3.6/3.8
  • @ArneTR for rightly pointing out that VIF calculation should include a constant, and correlation matrix should exclude target variable

References