copt
copt copied to clipboard
WIP: implementation of non-contiguous group lasso penalty GroupL1nc
This PR adds a new class to the copt.penalty module which defines a non-contiguous version of the existing GroupL1 penalty. The main differences between the two are:
- Parsing of the input. GroupL1nc requires just non-overlapping groups.
- Definition of
B
matrix adapted to non-contiguous setting (it's no longer block-diagonal). - Evaluation of
_prox_gl
adapted to non-contiguous setting. - Result of simple test adapted to new block matrix.
Minor style/PEP8 improvements have been applied to the modified files.
I've still got a couple of concerns on which I would like to have your opinion @fabianp .
- The
B
matrix inprox_factory
that I implemented in GroupL1nc is slightly different with respect to the one of GroupL1. The difference is in how the "forgotten" features are encoded inB
. Here's a small example. Letn_features = 5
andgroups = [(0, 1), (3, 4)]
, then theB
matrix for the two methods is the following:
b_GroupL1 = [[1.0, 1.0, 0.0, 0.0, 0.0],
[0.0, 0.0, -1.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 1.0, 1.0]]
b_GroupL1nc = [[1.0, 1.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 1.0, 1.0],
[0.0, 0.0, -1.0, 0.0, 0.0]]
I made a few tests and it seems to me that this does not change the result, but I don't have a proof for it. What happens is that we reorder the groups by putting all the "-1" singletons after the groups. Are you aware of any reason why it shouldn't work?
- My second concern is about the group structure. Are we good with accepting only non-overlapping groups or do we want a more general code that accepts overlapping groups or even tree structures?
After solving these two points we can think of merging GroupL1 and GroupL1nc, as the latter trivially generalizes the former.
This PR addresses #20 .
Coverage increased (+0.2%) to 88.959% when pulling be2c4e921555e606df7ff1e9ae9ee0cf447d469d on matteofrigo:groupL1-noncontiguous into e5cb995065e26acd543ef13a330351eb1c381bdd on openopt:master.
Thanks for the contribution! This already looks great.
I agree that it would be nice to have GroupL1 and GroupL1nc merged as a single object.
I agree with you that intuitively the order in which the forgotten groups appear should not matter for the penalty. This might change if we allow to have a different per-group regularization, but that's not the case yet.
This last part makes me wonder if we should weight the group regularization by the size of the group, as done for instance in Eq. (1) in https://statweb.stanford.edu/~tibs/ftp/sparse-grlasso.pdf . I think this makes a lot of sense when groups have very different sizes. Do you have an opinion on this @matteofrigo ? Of course this is somewhat orthogonal to your pull request, so feel free to ignore this point or postpone for a later contribution
As for the second concern, I would stop at non-overlapping groups, at least for the pull request. The code for more general objects would be significantly different, and which point I would split it in a different object.
I added the possibility to define group-specific penalties and some more general tests. Also, the two classes are merged back to a single one (GroupL1).
For the moment I added the possibility to use group-wise weights that are like the ones in the paper you mentioned (sqrt(len(g))
), it's inverse, and custom weights passed by the user. By default it uses a weight equal to 1 for each group. Anything else that you think it should go there @fabianp ?
It seems like my changes to how the B matrix is built broke everything. I'll look into it. For the dense matrices case it works, anyway.
The problem was in a test where the group sparsity was used with overlapping groups. I added a separate class for that case to keep the two worlds separate as @fabianp suggested in https://github.com/openopt/copt/pull/94#issuecomment-809608046 .
As for the second concern, I would stop at non-overlapping groups, at least for the pull request. The code for more general objects would be significantly different, and which point I would split it in a different object.
To summarise, here's what we did with this PR until now:
- Separated overlapping- (OGroupL1) and non-overlapping-groups (GroupL1) cases;
- Generalised GroupL1 to non-contiguous groups;
- Improved tests for GroupL1;
- Adapted tests for overlapping groups;
- Fixed minor PEP8 issues.
Anything to add?
Thanks Matteo. We're almost there, but there's a test that has been commented out while I think its important to keep it, as it's testing a valid implementation of the overlapping group lasso
Sorry @fabianp , I've been offline for some days.
I removed the test because I have misunderstood the suggestion to avoid the overlapping groups case.
Speaking about that test, I'm not able to make it pass. I haven't been able to track the bug to its root, but it seems to me that the new definition of the block matrix (in the penalty) changes the definition of the support matrix in a way that makes it wrong. Notice that the only actual difference with the old implementation of the penalty is the order in which we treat the groups. Do you have any low-level way to debug and test the prox_factory
function? I haven't found anything except from testing the correctness of the block matrix.