skglm icon indicating copy to clipboard operation
skglm copied to clipboard

FEAT - Add GroupLasso and GroupLogisticRegression estimators

Open mathurinm opened this issue 1 year ago • 14 comments

mathurinm avatar Oct 30 '23 15:10 mathurinm

Hello, following up on the issue from Celer, it would be great to be able to run GroupLasso with non-neg weights. The standard Lasso already uses positive=True. Would that be possible to implement for GroupLasso?

For my application I also need to run on sparse X input matrices. From what I have seen, some of the regressors can handle the csc matrices. Would that be possible for GroupLasso?

tomaszkacprzak avatar Jan 09 '24 19:01 tomaszkacprzak

Hello, Thanks a lot for your interest!

  • adding a group Lasso with positive weights should be "easy", we should be able to do it quickly (i.e. this week if we have time)
  • handling sparse matrices for the group Lasso is definitely doable, but might take a bit more time to check that the implementation is correct

QB3 avatar Jan 09 '24 22:01 QB3

Thank you, I am glad that it is doable. Is there anything I can do to help?

tomaszkacprzak avatar Jan 10 '24 08:01 tomaszkacprzak

@tomaszkacprzak we can start by supporting sparse matrices for the GroupBCD solver. Do you think you could send a PR doing that ?

You'd need to implement construct_grad_sparse and bcd_epoch_sparse in solvers/group_bcd.py. They should do the same thing as their existing dense versions (same name without _sparse), but when X is a csc matrix, passed as a triplet X_data, X_indices, X_indptr

See current implementations for CD (not BCD) in solvers/anderson_cd.py: sparse version https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L313 vs dense version https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L272

Then inside the solver we just have an if when doing the epoch of coordinate descent : https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L147

Do not hesitate to start a very WIP PR, so that we can discuss over there!

mathurinm avatar Jan 10 '24 16:01 mathurinm

In the meantime, I will open a separate PR for the positive weights

QB3 avatar Jan 10 '24 17:01 QB3

Do we want a GroupLasso estimator as well ?

QB3 avatar Jan 12 '24 20:01 QB3

I think it's a popular enough model such that the visibility benefits of implementing its class overweighs the cost of maintenance: I am +1 on the GroupLasso estimator

mathurinm avatar Jan 15 '24 09:01 mathurinm

@tomaszkacprzak did you try our implementation of positive for the Group Lasso ? ANd would you be interested in implementing sparse X support for group ?

mathurinm avatar Feb 14 '24 08:02 mathurinm

@mathurinm I just saw it, going to give it a try. Then I'll start the PR for sparse X. Thanks!

tomaszkacprzak avatar Mar 04 '24 13:03 tomaszkacprzak

@mathurinm I noticed that in test_group.py L79 the random seed is set so that for positive=True all come out 0. When I change the random seed, some positive weights appear, and the error is thrown.

Example for random seed 12323

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 5 / 50 (10%)
Max absolute difference: 0.46942555
Max relative difference: 0.22505575
 x: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.183918,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.295971, 0.      , 0.      ,...
 y: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.241598, 0.      , 0.      ,...

Any suggestions what can cause this?

tomaszkacprzak avatar Mar 04 '24 13:03 tomaszkacprzak

Thanks for the report, this seems to show that the positive=True feature introduced in https://github.com/scikit-learn-contrib/skglm/commit/1dcbfffa91999c68e23ab76c8b5dc00456766705 was not correct. I will investigate this week.

QB3 avatar Mar 04 '24 14:03 QB3

Yes indeed @QB3, I open a PR #225

Badr-MOUFAD avatar Mar 04 '24 16:03 Badr-MOUFAD

This should be fixed now @tomaszkacprzak, thanks @Badr-MOUFAD

mathurinm avatar Mar 06 '24 12:03 mathurinm

Great, I just checked it out, works well. Thanks @Badr-MOUFAD. I'll start a PR for sparse X.

tomaszkacprzak avatar Mar 06 '24 18:03 tomaszkacprzak

Fixed by #228

mathurinm avatar May 31 '24 11:05 mathurinm