probability icon indicating copy to clipboard operation
probability copied to clipboard

feat: implementation of bounded LBFGS algorithm

Open mikeevmm opened this issue 4 years ago • 10 comments

This commit implements the bounded version of the Limited-memory Broyden–Fletcher–Goldfarb–Shanno optimization algorithm, per [1], in lbfgsb.py. It is lacking unit tests (which have been run outside of the committed changes), because I don't understand how to properly implement them alongside, e.g., the LBFGS unit tests. The following unit tests were ran:

  • Converges to a 60-dimensional quadratic bowl
  • Converges to the correct minimum of the Rosenbrock surface given sufficiently far-away bounds
  • Converges to [5., 5.] in the Rosenbrock surface, when given bounds [(5, 20), (-10, 5)], which matches the original Fortran implementation.

I now need help making sure the code is up to the style guide, per issue #1273 .

[1] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, & Ciyou Zhu (1995). A Limited Memory Algorithm for Bound Constrained Optimization SIAM Journal on Scientific Computing, 16(5), 1190–1208. https://doi.org/10.1137/0916069

mikeevmm avatar Apr 04 '21 18:04 mikeevmm

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

googlebot avatar Apr 04 '21 18:04 googlebot

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

google-cla[bot] avatar Apr 04 '21 18:04 google-cla[bot]

@googlebot I signed it!

mikeevmm avatar Apr 04 '21 18:04 mikeevmm

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

google-cla[bot] avatar Apr 04 '21 18:04 google-cla[bot]

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

googlebot avatar Apr 04 '21 18:04 googlebot

Hi Miguel, thanks very much for this PR. This is a really nice contribution, and it's great to see such thorough and well-documented code. I made a quick pass with a few minor style contents, but overall this looks to be in quite good shape.

I do have a high-level concern around the use of ragged Tensors. As you note, they can be pretty finicky, they make it tricky to work with batches, and they raise compatibility issues with, e.g., our JAX backend. It's not a dealbreaker, but if you're willing, I'd like to try to help think through whether another approach could work. I haven't fully worked through the code yet---to help me understand, can you explain what the ragged Tensors are being used for, and why they need to be ragged?

Before checking this in, we'll need to add some tests, along with corresponding entries in the BUILD file (you can use the entries for lbfgs and lbfgs_test as a model). I don't think the tests need to interact with the LBFGS tests, necessarily (although feel free to copy any that seem relevant), but at minimum we need a file lbfgsb_test.py that follows the pattern of lfbgs_test and other tests:

  1. It should define a test class that inherits from tensorflow_probability.python.internal.test_util.TestCase
  2. This class should be decorated with @test_util.test_all_tf_execution_regimes (i.e., runs tests in graph and eager modes)
  3. The bottom of the file should have the the magic lines if __name__ == '__main__': tf.test.main().
  4. Unit tests are implemented as methods of the test class whose names start with test.

It sounds like you already have some tests, so if you could adapt them to add to this PR that'd be a great start. Is there anything in particular you're not sure about? I'm happy to try to help figure out the best way to set things up.

  • Dave

davmre avatar Apr 08 '21 22:04 davmre

Hello,

Thank you for the kind words. I am again a bit swamped, so I will return to this when I can, but I have read all of your comments and have no objections.

Re. ragged tensors: they were more in use (and a greater source of problems) previously, as I was keeping track of free_vars_idx and other free-variable-related tensors with ragged arrays. I eventually replaced those with tensors with flag values, so it should no longer be a problem.

Right now, as far as I can tell and remember, ragged tensors are only in use in lines 576-590, to transpose differently between batches, and since these are immediately to_tensored, I don't expect it to be a problem (but really I wouldn't know).

I have also found that the current approach evaluates the cost function far far more often than the Fortran implementation (about 10 times as much in my use case). Depending on how costly this is, it can be a real bottleneck (this turned out to be the case for me). I am working on implementing the direct primal method to circumvent the double use of hanger_zhang_line_search.

mikeevmm avatar Apr 14 '21 14:04 mikeevmm

Hi! I was curious if you had time to investigate addressing / implementing these issues for LBFGS-B. We have some internal use cases for it, so I think we would be excited to have it! I am also happy to help / use this PR as a foundation for checking in an implementation for LBFGS-B.

Thank you! Srinivas

srvasude avatar Jun 24 '21 21:06 srvasude

Hello @srvasude,

I've uploaded the more recent changes I've made to the code.

It remains to write unit tests, and modifying the code to accept more batch dimensions. The modifications of commit 41b2267 should help a lot with this latter point, because the queue no longer has the batch dimensions "in the middle", and are instead leading. This means a lot of the stuff that's using einsum and ... should already work out of the box with many batch dimensions. Regardless, the code still needs some adapting for that to work.

Commit 64d30a4 is meant to be a starting point for writing the unit tests. However, it would maybe be best to implement multiple batching dimensions before writing the tests.

Miguel

mikeevmm avatar Jun 25 '21 14:06 mikeevmm

Hi, It seems like this wasn't picked up. Is it still worthwhile? @srvasude: Have you implemented this elsewhere? Best, Miguel

mikeevmm avatar Aug 25 '25 21:08 mikeevmm