Add Lasso regression model with L1 regularization for feature selection

Open MikeDank opened this issue 1 month ago • 0 comments

Contribution Type

New Model: Lasso Regression with L1 Regularization

Author

Michael Dankanich (Mdanka2)

Paper

Barttender: https://arxiv.org/abs/2411.12707

Description

This PR adds a Lasso regression model to PyHealth with L1 regularization for automatic feature selection in healthcare prediction tasks. The model extends the existing LogisticRegression architecture by adding:

L1 penalty (alpha parameter) for sparse weight learning during training
get_feature_importance() method for extracting feature importance scores
get_selected_features(threshold) method for identifying selected features

This enables clinicians to identify which clinical variables (labs, vitals, etc.) are most predictive for a given outcome, improving model interpretability and reducing the number of required tests.

Files Changed

pyhealth/models/lasso.py (NEW, 362 lines)
pyhealth/models/__init__.py (MODIFIED, +1 line)
tests/core/test_lasso.py (NEW, 280 lines, 10 tests)

Test Results

All 10 tests pass successfully:

Ran 10 tests in 0.084s OK

Tests cover:

Model initialization
Forward/backward pass
L1 regularization verification
Feature importance extraction
Feature selection functionality
Custom alpha values
Custom embedding dimensions
Regression task support

Documentation

Complete Google-style docstrings with Args, Note, and Examples sections
Type hints for all functions following PyHealth conventions
Runnable examples with expected output
Alpha tuning guidance for clinical data (0.01-0.1 range recommended)
Follows PEP8 style (88 char line length)

Files to Review

Main implementation: pyhealth/models/lasso.py
Test suite: tests/core/test_lasso.py

Dec 07 '25 01:12 MikeDank