optimistix icon indicating copy to clipboard operation
optimistix copied to clipboard

Bounded + Constrained optimisation

Open johannahaffner opened this issue 10 months ago • 3 comments

This is the current state of my development branch, shared here as a basis for discussion.

I still have some feature branches (second-order corrections in IPOPT, constrained trust regions, OSQP) that I'm going to merge into this once they are a little cleaner.

This also addresses https://github.com/patrick-kidger/optimistix/issues/116, by including a projected gradient descent, similar to optax' implementation of projections.

johannahaffner avatar Mar 05 '25 20:03 johannahaffner

Tests fail because they require the generalised lx.DiagonalLinearOperator that takes a PyTree, rather than a vector, as its input (added in https://github.com/patrick-kidger/lineax/pull/125). The latest lineax release predates this addition.

johannahaffner avatar Mar 05 '25 20:03 johannahaffner

This is awesome. I'll start taking a look through it.

Good point about Lineax, we should do a release. Send a version bump PR and I'll approve?

patrick-kidger avatar Mar 06 '25 11:03 patrick-kidger

Note: tests pass locally, but I'm experimenting with small breaks to the descent API and pyright is not happy. Looks like we also need to upgrade pyright while we're here.

johannahaffner avatar Apr 16 '25 20:04 johannahaffner

Just checking in on whether bounded optimisation is still a goal? I notice a few branches/PRs got closed with the 0.11 release (https://github.com/patrick-kidger/optimistix/pull/143 and https://github.com/patrick-kidger/optimistix/pull/150 - apologies for not getting around to help much with the latter!), but see this is still open. A quick note on the zoom linesearch thing - at this point I imagine it looks quite difference to the one in jaxopt - I'm near convinced the jaxopt implementation has bugs, so I wouldn't rely on mimicking it too much if the implementation here still looks similar. I can't speak to the optax one, but I believe it was originally a port of the jaxopt version

cjchristopher avatar Oct 21 '25 07:10 cjchristopher

@cjchristopher thanks for the warning! Can you please point to the bugs? I started with a naive implementation, but over time the zoom linesearch ended up being quite similar to the JAXopt and Optax versions, and I did rely on them for some of it, so it would be great to avoid known bugs.

bagibence avatar Oct 21 '25 11:10 bagibence

@cjchristopher thanks for the warning! Can you please point to the bugs? I started with a naive implementation, but over time the zoom linesearch ended up being quite similar to the JAXopt and Optax versions, and I did rely on them for some of it, so it would be great to avoid known bugs.

Ahah I wish they were "known" bugs - this is partially on me.

Unfortunately I haven't taken the time to explicitly isolate them, all I know absolutely is that I got very broken behaviour when I had a codepath that used zoom linesearch in jaxopt. There's a few tickets if you check jaxopt that refer to pretty big divergences in optimiser behaviour between some of the jaxopt routines and their scipy equivalents, and these were never explained or resolved satisfactorily to my knowledge

In my own attempts trying to get coherent behaviour with jaxopts L-BFGS-B I traced the issue to Zoom but due to deadlines never got a chance to fully reconcile the implementation with what is given in the original paper (or say, comparing it to other reference implementations out there in other languages) - I just pivoted to Projected GD to get some meaningful results. I do recall spotting what seemed like a big difference in the initialisation steps - this was after the archival announcement for jaxopt though. I wouldn't put too much weight on my testimony here, though, until i have time to go back and be firm about any discrepancies I thought I saw. Based on all the back and forth detail in your zoom PR though, my a priori is that you've been careful enough so far!

cjchristopher avatar Oct 21 '25 12:10 cjchristopher

Just checking in on whether bounded optimisation is still a goal?

Yup, it is! Just one of those things that happens as-and-when Johanna and I get time on our weekends. :D

I'm near convinced the jaxopt implementation has bugs, so I wouldn't rely on mimicking it too much if the implementation here still looks similar.

Yikes, that's a bit scary. That probably means using papers and other implementations as references here.

patrick-kidger avatar Oct 21 '25 13:10 patrick-kidger

Yikes, that's a bit scary. That probably means using papers and other implementations as references here.

I don't intend to cause a panic haha. But this is one such issue from jaxopt https://github.com/google/jaxopt/issues/620.
I observed similar behaviour with some of the warnings reported in that issue. Where the zoom would end up throwing the point some way outside the bounds, but in jaxopt, from memory, due to how and where the bounds were enforced, this could cause all sorts of bizarre behaviour. Exploding gradients is something I saw a few times. A suggestion on another issue referenced in the one above was to allow it to fail, but this also didn't work.

In any case, I guess this is just a gentle nudge towards rigor and reference - notably, if possible, making sure behaviours roughly matches that of known good implementations elsewhere (numerical quirks notwithstanding)

Edit: digging up that reference has nudged my brain slightly. I think my initialisation comment in my previous reply is because I was often getting the warning about there being no feasible range satisfying the conditions on the very first iteration of zoom on the first or second iteration of L-BFGS-B. When tracing the code to determine under what situations that gets raised, and I can't recall precisely, but I think it was basically guaranteed to happen due to a missed step from the algorithms (3.5 and 3.6 in Nocedal and Wright) with respect to the initial choice of \alpha_1 ? And then it was luck if you happened to get reasonable results (e.g. you stayed in your bounds and didn't get a gradient explosion) or not. My problem (and perhaps the one referenced in jaxopt 620) are perhaps unlucky?

Edit 2: actually, it might be one of the checks in Nocedal Algorithm 3.5 is either missed, or not made correctly. Sorry I can't be more precise!

cjchristopher avatar Oct 21 '25 14:10 cjchristopher

Right! So this is still happening :D

I'll close this draft PR, since it is extremely outdated by now. I hope you don't mind continuing the discussion about the Zoom line search in that PR, @bagibence and @cjchristopher :)

johannahaffner avatar Oct 23 '25 20:10 johannahaffner

@cjchristopher did they have an L-BFGS-B with a Zoom line search? That sounds like an odd choice. The fast convergence of the Zoom line search results from the criteria it applies for acceptance, but also from its ability to increase step lengths to find one that meets said criteria. If the proposed step is such that a step length of at most unity may be taken without leaving the feasible set, then the Zoom line search may easily disregard that - and to prevent it from doing so means adding safeguards in a whole bunch of places, and probably throwing away most of the advantages it has over a more conservative backtracking line search, which is guaranteed to stay within the feasible set.

PS: putting this here - happy to continue the conversation elsewhere, once we have a new open PR for any of these things :)

johannahaffner avatar Oct 23 '25 20:10 johannahaffner