PyHealth
PyHealth copied to clipboard
Add Lasso regression model with L1 regularization for feature selection
Contribution Type
New Model: Lasso Regression with L1 Regularization
Author
Michael Dankanich (Mdanka2)
Paper
Barttender: https://arxiv.org/abs/2411.12707
Description
This PR adds a Lasso regression model to PyHealth with L1 regularization for automatic feature selection in healthcare prediction tasks. The model extends the existing LogisticRegression architecture by adding:
- L1 penalty (alpha parameter) for sparse weight learning during training
get_feature_importance()method for extracting feature importance scoresget_selected_features(threshold)method for identifying selected features
This enables clinicians to identify which clinical variables (labs, vitals, etc.) are most predictive for a given outcome, improving model interpretability and reducing the number of required tests.
Files Changed
pyhealth/models/lasso.py(NEW, 362 lines)pyhealth/models/__init__.py(MODIFIED, +1 line)tests/core/test_lasso.py(NEW, 280 lines, 10 tests)
Test Results
All 10 tests pass successfully:
Ran 10 tests in 0.084s OK
Tests cover:
- Model initialization
- Forward/backward pass
- L1 regularization verification
- Feature importance extraction
- Feature selection functionality
- Custom alpha values
- Custom embedding dimensions
- Regression task support
Documentation
- Complete Google-style docstrings with Args, Note, and Examples sections
- Type hints for all functions following PyHealth conventions
- Runnable examples with expected output
- Alpha tuning guidance for clinical data (0.01-0.1 range recommended)
- Follows PEP8 style (88 char line length)
Files to Review
- Main implementation:
pyhealth/models/lasso.py - Test suite:
tests/core/test_lasso.py