skglm
skglm copied to clipboard
FEAT - Add GroupLasso and GroupLogisticRegression estimators
Hello, following up on the issue from Celer, it would be great to be able to run GroupLasso with non-neg weights. The standard Lasso already uses positive=True
. Would that be possible to implement for GroupLasso?
For my application I also need to run on sparse X input matrices. From what I have seen, some of the regressors can handle the csc
matrices. Would that be possible for GroupLasso?
Hello, Thanks a lot for your interest!
- adding a group Lasso with positive weights should be "easy", we should be able to do it quickly (i.e. this week if we have time)
- handling sparse matrices for the group Lasso is definitely doable, but might take a bit more time to check that the implementation is correct
Thank you, I am glad that it is doable. Is there anything I can do to help?
@tomaszkacprzak we can start by supporting sparse matrices for the GroupBCD solver. Do you think you could send a PR doing that ?
You'd need to implement construct_grad_sparse
and bcd_epoch_sparse
in solvers/group_bcd.py
. They should do the same thing as their existing dense versions (same name without _sparse
), but when X is a csc matrix, passed as a triplet X_data, X_indices, X_indptr
See current implementations for CD (not BCD) in solvers/anderson_cd.py
: sparse version https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L313 vs dense version https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L272
Then inside the solver we just have an if when doing the epoch of coordinate descent : https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L147
Do not hesitate to start a very WIP PR, so that we can discuss over there!
In the meantime, I will open a separate PR for the positive weights
Do we want a GroupLasso
estimator as well ?
I think it's a popular enough model such that the visibility benefits of implementing its class overweighs the cost of maintenance: I am +1 on the GroupLasso
estimator
@tomaszkacprzak did you try our implementation of positive
for the Group Lasso ? ANd would you be interested in implementing sparse X support for group ?
@mathurinm I just saw it, going to give it a try. Then I'll start the PR for sparse X. Thanks!
@mathurinm I noticed that in test_group.py
L79 the random seed is set so that for positive=True
all come out 0. When I change the random seed, some positive weights appear, and the error is thrown.
Example for random seed 12323
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Mismatched elements: 5 / 50 (10%)
Max absolute difference: 0.46942555
Max relative difference: 0.22505575
x: array([0. , 0. , 0. , 0. , 0. , 0.183918,
0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.295971, 0. , 0. ,...
y: array([0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.241598, 0. , 0. ,...
Any suggestions what can cause this?
Thanks for the report, this seems to show that the positive=True
feature introduced in https://github.com/scikit-learn-contrib/skglm/commit/1dcbfffa91999c68e23ab76c8b5dc00456766705 was not correct.
I will investigate this week.
Yes indeed @QB3, I open a PR #225
This should be fixed now @tomaszkacprzak, thanks @Badr-MOUFAD
Great, I just checked it out, works well. Thanks @Badr-MOUFAD. I'll start a PR for sparse X.
Fixed by #228