ADNLPModels.jl icon indicating copy to clipboard operation
ADNLPModels.jl copied to clipboard

Fix type instabilities in ADNLPModels.jl

Open MaxenceGollier opened this issue 7 months ago • 23 comments

#347

@amontoison @tmigot This is where I am so far.

Attached are my benchmark results... grad_benchmarks.zip

MaxenceGollier avatar Jun 05 '25 09:06 MaxenceGollier

@tmigot Are you fine with the new set_adbackend ?

amontoison avatar Jun 06 '25 14:06 amontoison

@tmigot Are you fine with the new set_adbackend ?

I will check that today

tmigot avatar Jun 06 '25 17:06 tmigot

Why the tests are failing? Is it just because NLS grad! function uses other function behind and should not be tested or is there some typing issue in NLPModels?

Your benchmarks are indeed surprising @MaxenceGollier , what were the tests problems?

tmigot avatar Jun 07 '25 00:06 tmigot

@MaxenceGollier @amontoison I moved the PR from draft to review to run the package benchmark by curiosity

tmigot avatar Jun 07 '25 00:06 tmigot

Why the tests are failing? Is it just because NLS grad! function uses other function behind and should not be tested or is there some typing issue in NLPModels?

That's what I have been trying to figure out, some Hessian (with JET I mean) tests also seem to fail for some reason so there might be other unstabilities hidden in this repo...

what were the tests problems?

What do you mean ? as I said, I still have to figure things out, I have a lot of urgent work to do on something else next week, I will be available to work this out the week after that...

Can you create an issue in NLPModelsTest.jl to add this in all models?

Yes good idea, I found similar issues on QuadraticModels.jl... (still need to open an issue but I don't have much time currently)

MaxenceGollier avatar Jun 07 '25 09:06 MaxenceGollier

What do you think of this version of set_adbackend ? @amontoison @tmigot. I tried to implement all your suggestions

MaxenceGollier avatar Jun 26 '25 12:06 MaxenceGollier

I think tests keep failing due to inherent type instabilities in ForwardDiff.jl. See in ForwardDiff/docs/advanced.md :

If your input dimension is constant across calls, you should explicitly select a chunk size rather than relying on ForwardDiff's heuristic. There are two reasons for this. The first is that ForwardDiff's heuristic depends only on the input dimension, whereas in reality the optimal chunk size will also depend on the target function. The second is that ForwardDiff's heuristic is inherently type-unstable, which can cause the entire call to be type-unstable.

The remaining type instabilities should be solved by having a closer look to those of ForwardDiff.

MaxenceGollier avatar Jun 26 '25 14:06 MaxenceGollier

@tmigot Can you take care of the review? I am still traveling this week and won't be able to have a look before 2 weeks at least.

amontoison avatar Jun 29 '25 22:06 amontoison

I think tests keep failing due to inherent type instabilities in ForwardDiff.jl. See in ForwardDiff/docs/advanced.md :

If your input dimension is constant across calls, you should explicitly select a chunk size rather than relying on ForwardDiff's heuristic. There are two reasons for this. The first is that ForwardDiff's heuristic depends only on the input dimension, whereas in reality the optimal chunk size will also depend on the target function. The second is that ForwardDiff's heuristic is inherently type-unstable, which can cause the entire call to be type-unstable.

The remaining type instabilities should be solved by having a closer look to those of ForwardDiff.

I am unfortunately not too surprised by this. Thanks to your analysis we will improve our part of the code. We should try to think of a way to still test that our code is good.

tmigot avatar Jun 30 '25 01:06 tmigot

bump

amontoison avatar Jul 22 '25 05:07 amontoison

I am unfortunately not too surprised by this. Thanks to your analysis we will improve our part of the code. We should try to think of a way to still test that our code is good.

I agree. Perhaps the best we can do is to call the report_opt macro but let it ignore all the issues from the ForwardDiff package. If our code is stable with all other backends then we'd consider that our code is good enough.

MaxenceGollier avatar Aug 06 '25 10:08 MaxenceGollier

I think this is ready for another round of review @tmigot @amontoison. FreeBSD fails because of timeout. And i don't know how to run the benchmarks in GitHub...

MaxenceGollier avatar Sep 15 '25 16:09 MaxenceGollier

@MaxenceGollier Can you rebase your branch and fix the CI tests?

amontoison avatar Oct 16 '25 05:10 amontoison

@amontoison FreeBSD fails because we should use [email protected] if we use Julia 1.12 and @v0.9 for Julia < 1.12. Do you know how we could do this well ?

MaxenceGollier avatar Oct 16 '25 19:10 MaxenceGollier

@amontoison FreeBSD fails because we should use [email protected] if we use Julia 1.12 and @v0.9 for Julia < 1.12. Do you know how we could do this well ?

I update the compat entry to support both JET 0.9, 0.10.

amontoison avatar Oct 16 '25 19:10 amontoison

Thank you @amontoison, I need to update the doc now. I would like to see the benchmarks ran by GitHub before i do that how can i get them to run ?

MaxenceGollier avatar Oct 16 '25 21:10 MaxenceGollier

We need to open another PR to fix the benchmarks with Julia 1.12. Once it is merged, we can rebase this branch on top of it. I can't have a look before Sunday / next week if you don't do it before.

amontoison avatar Oct 16 '25 21:10 amontoison

@tmigot We should be consistent for Maxence. I think a non-inplace set_backend is still relevant and it can return a new ADNLPModel.

amontoison avatar Oct 21 '25 22:10 amontoison

@tmigot We should be consistent for Maxence and don't give contrary comments. I think a non-inplace set_backend is still relevant and return a new ADNLPModel.

Do we plan to add the new set_adbackend in this PR?

tmigot avatar Oct 21 '25 22:10 tmigot

@tmigot We should be consistent for Maxence and don't give contrary comments. I think a non-inplace set_backend is still relevant and return a new ADNLPModel.

Do we plan to add the new set_adbackend in this PR?

We can just rename set_adbackend! as set_adbackend and specify that we return a new ADNLPModels where we "recycle" the backends + setup the new one instead of updating the current ADNLPModels. It should be quite minor modifications in the docstring and code.

What do you think?

amontoison avatar Oct 21 '25 23:10 amontoison

Sounds, good.

tmigot avatar Oct 21 '25 23:10 tmigot

Big GitHub noob question before moving on: where can i see the benchmark results posted ?

Shouldn't it be posted as a comment here ?

MaxenceGollier avatar Oct 22 '25 02:10 MaxenceGollier

A rebase was needed for sure, let's see if doing a PR from a fork is also an issue. I fixed a few CI errors, reintroduced the set_adbackend and updated the related documentation. When the PR will be ready, I will ask Maxence to get ownership of the commits.

amontoison avatar Oct 22 '25 04:10 amontoison

Hi guys, I think we should move forward with this, I believe this is an important improvement for this package. As discussed above, I need to have the commits in a branch of this repo and not a local fork for the benchmarks to run but i don't have rights to even create a branch.

Perhaps we could

  1. Create a branch, say mgollier-type-stab-dev on this repo,
  2. I make a PR from the branch in my fork with the commits to this new branch
  3. We run the benchmarks from there
  4. Open a final PR from the mgollier-type-stab-dev branch to the master branch with the updates in the docs etc.

Please let me know.

MaxenceGollier avatar Dec 15 '25 18:12 MaxenceGollier

Superseded by #365.

MaxenceGollier avatar Dec 18 '25 10:12 MaxenceGollier