argmin
argmin copied to clipboard
Support OWL-QN method (L-BFGS with L1-regularization)
Fix #243
This PR adds the L1-regularization option to LBFGS.
This option enables L1-regularization performed by the OWL-QN method.
Thanks a lot for this PR! I will try to review it during the weekend (but unfortunately I can't guarantee it).
At first sight I was wondering if it would be better to have a dedicated OWL-QN solver instead of adding it to the existing L-BFGS. This would lead to code duplication but it may also be easier to understand and to maintain in the long run (and maybe it would be also better in terms of performance). But I'm not sure yet about what's the best approach. I'd be interested in your opinion on this.
I also initially thought it would be better to define a separate structure. However, on second thought, OWL-QN only adds a few processing to L-BFGS, and a separate structure would incur a high administrative cost since changes to L-BFGS would have to be reflected in OWL-QN every time.
However, on second thought, OWL-QN only adds a few processing to L-BFGS, and a separate structure would incur a high administrative cost since changes to L-BFGS would have to be reflected in OWL-QN every time.
Yes, after having a more detailed look at the code I tend to agree. The only downside is the increased number of math-related trait bounds which add an additional implementation burden on people using their own types. But I think this is acceptable for now. We can reconsider this if it starts to become a problem.
Thank you for your review. I fixed them. Also, I fixed several bugs. The pseudo-gradient should also be used in line search, but the previous implementation used the normal gradient. I added the L1-regularization version of Rosenbrock. Comparing the results with the following grid search results, I think the results are correct.
# Rosenbrock w/o regularization
best = (0.0, 0.0, float('inf'))
for x in np.arange(0, 2, 0.001):
for y in np.arange(0, 2, 0.001):
z = (1.0 - x)**2 + 100 * (y - x**2)**2
if z < best[2]:
best = (x, y, z)
print(best)
#=> (1.0, 1.0, 0.0)
# Rosenbrock w/ L1 regularization (coeff = 1.0)
best = (0.0, 0.0, float('inf'))
for x in np.arange(0, 2, 0.001):
for y in np.arange(0, 2, 0.001):
z = (1.0 - x)**2 + 100 * (y - x**2)**2 + np.abs(x) + np.abs(y)
if z < best[2]:
best = (x, y, z)
print(best)
#=> (0.249, 0.057, 0.8725020001)
TODO: Add the L1 term to the cost function of LineSearchProblem.
Thank you for your review. I updated the documentation.