argmin icon indicating copy to clipboard operation
argmin copied to clipboard

Support OWL-QN method (L-BFGS with L1-regularization)

Open vbkaisetsu opened this issue 3 years ago • 2 comments

Fix #243

This PR adds the L1-regularization option to LBFGS. This option enables L1-regularization performed by the OWL-QN method.

vbkaisetsu avatar Aug 12 '22 12:08 vbkaisetsu

Thanks a lot for this PR! I will try to review it during the weekend (but unfortunately I can't guarantee it).

At first sight I was wondering if it would be better to have a dedicated OWL-QN solver instead of adding it to the existing L-BFGS. This would lead to code duplication but it may also be easier to understand and to maintain in the long run (and maybe it would be also better in terms of performance). But I'm not sure yet about what's the best approach. I'd be interested in your opinion on this.

stefan-k avatar Aug 12 '22 13:08 stefan-k

I also initially thought it would be better to define a separate structure. However, on second thought, OWL-QN only adds a few processing to L-BFGS, and a separate structure would incur a high administrative cost since changes to L-BFGS would have to be reflected in OWL-QN every time.

vbkaisetsu avatar Aug 12 '22 13:08 vbkaisetsu

However, on second thought, OWL-QN only adds a few processing to L-BFGS, and a separate structure would incur a high administrative cost since changes to L-BFGS would have to be reflected in OWL-QN every time.

Yes, after having a more detailed look at the code I tend to agree. The only downside is the increased number of math-related trait bounds which add an additional implementation burden on people using their own types. But I think this is acceptable for now. We can reconsider this if it starts to become a problem.

stefan-k avatar Aug 14 '22 06:08 stefan-k

Thank you for your review. I fixed them. Also, I fixed several bugs. The pseudo-gradient should also be used in line search, but the previous implementation used the normal gradient. I added the L1-regularization version of Rosenbrock. Comparing the results with the following grid search results, I think the results are correct.

# Rosenbrock w/o regularization
best = (0.0, 0.0, float('inf'))
for x in np.arange(0, 2, 0.001):
    for y in np.arange(0, 2, 0.001):
        z = (1.0 - x)**2 + 100 * (y - x**2)**2
        if z < best[2]:
            best = (x, y, z)
print(best)
#=> (1.0, 1.0, 0.0)

# Rosenbrock w/ L1 regularization (coeff = 1.0)
best = (0.0, 0.0, float('inf'))
for x in np.arange(0, 2, 0.001):
    for y in np.arange(0, 2, 0.001):
        z = (1.0 - x)**2 + 100 * (y - x**2)**2 + np.abs(x) + np.abs(y)
        if z < best[2]:
            best = (x, y, z)
print(best)
#=> (0.249, 0.057, 0.8725020001)

TODO: Add the L1 term to the cost function of LineSearchProblem.

vbkaisetsu avatar Aug 14 '22 12:08 vbkaisetsu

Thank you for your review. I updated the documentation.

vbkaisetsu avatar Aug 16 '22 23:08 vbkaisetsu