PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

Add Lasso regression model with L1 regularization for feature selection

Open MikeDank opened this issue 1 month ago • 0 comments

Contribution Type

New Model: Lasso Regression with L1 Regularization

Author

Michael Dankanich (Mdanka2)

Paper

Barttender: https://arxiv.org/abs/2411.12707

Description

This PR adds a Lasso regression model to PyHealth with L1 regularization for automatic feature selection in healthcare prediction tasks. The model extends the existing LogisticRegression architecture by adding:

  • L1 penalty (alpha parameter) for sparse weight learning during training
  • get_feature_importance() method for extracting feature importance scores
  • get_selected_features(threshold) method for identifying selected features

This enables clinicians to identify which clinical variables (labs, vitals, etc.) are most predictive for a given outcome, improving model interpretability and reducing the number of required tests.

Files Changed

  • pyhealth/models/lasso.py (NEW, 362 lines)
  • pyhealth/models/__init__.py (MODIFIED, +1 line)
  • tests/core/test_lasso.py (NEW, 280 lines, 10 tests)

Test Results

All 10 tests pass successfully:

Ran 10 tests in 0.084s OK

Tests cover:

  • Model initialization
  • Forward/backward pass
  • L1 regularization verification
  • Feature importance extraction
  • Feature selection functionality
  • Custom alpha values
  • Custom embedding dimensions
  • Regression task support

Documentation

  • Complete Google-style docstrings with Args, Note, and Examples sections
  • Type hints for all functions following PyHealth conventions
  • Runnable examples with expected output
  • Alpha tuning guidance for clinical data (0.01-0.1 range recommended)
  • Follows PEP8 style (88 char line length)

Files to Review

  • Main implementation: pyhealth/models/lasso.py
  • Test suite: tests/core/test_lasso.py

MikeDank avatar Dec 07 '25 01:12 MikeDank