NonlinearSolve.jl icon indicating copy to clipboard operation
NonlinearSolve.jl copied to clipboard

Default Automatic Differentiation Choice

Open avik-pal opened this issue 1 year ago • 8 comments

Forward AD

  • [x] Non-Sparse Default AD should be AutoPolyesterForwardDiff (if that package is loaded). This will be similar to SimpleNonlinearSolve

Reverse AD

  • For in-place problems, we default to AutoFiniteDiff. This is a really bad choice. We should default to: (conditional on the package being loaded)
    • [ ] AutoReverseDiff (after implementing the corresponding version in SparseDiffTools) for non-GPU versions
    • [ ] AutoEnzyme for all other cases

avik-pal avatar Jan 16 '24 13:01 avik-pal

Should we just depend on Polyseter and make it the default here?

AutoPolyesterForwardDiff (if that package is loaded)

OrdinaryDiffEq.jl depends on Polyester, and so you might have an odd interaction that some codes work better or worse depending on whether you have the ODE solver loaded, and this might be a little invisible to many users.

For in-place problems, we default to AutoFiniteDiff. This is a really bad choice. We should default to: (conditional on the package being loaded)

Why not Forward? Are you talking about a specific size of the Jacobian?

ChrisRackauckas avatar Jan 18 '24 08:01 ChrisRackauckas

I am still debating on the default as polyesterforwarddiff, for the bruss we see a clear improvement but for the battery problem there is a slowdown. I need to investigate this a bit to verify it is not my code that is problematic.

avik-pal avatar Jan 18 '24 08:01 avik-pal

Why not Forward? Are you talking about a specific size of the Jacobian?

We could construct the full jacobian and then compute the VJP but the default was based on the implementations available in SparseDiffTools and was not updated after that. Currently we maintain the JacobianOperator in house so we can easily switch that as well.

avik-pal avatar Jan 18 '24 08:01 avik-pal

The problem is that if there are any other threads then it's not going to be a speedup since you'll lock the threads. This makes it pretty unsafe unless the user knows it's going to be using Polyester. That is why in OrdinaryDiffEq.jl it's always an opt-in (and maybe something we can make into an extension), and I think the same would need to be done here.

I think we should highlight it in documentation and tutorials much better than we do now, since indeed for any large enough problem it's a good idea, but it's hard to make something that bypasses hierarchical threading into a default.

ChrisRackauckas avatar Jan 18 '24 08:01 ChrisRackauckas

We could construct the full jacobian and then compute the VJP but the default was based on the implementations available in SparseDiffTools and was not updated after that. Currently we maintain the JacobianOperator in house so we can easily switch that as well.

Oh you're talking about the default vjp, for some line searches?

ChrisRackauckas avatar Jan 18 '24 08:01 ChrisRackauckas

Oh you're talking about the default vjp, for some line searches?

For some of the line searches and if you use a krylov method like LSMR requiring both $J^Tv$ and $Ju$

avik-pal avatar Jan 19 '24 03:01 avik-pal

That is why in OrdinaryDiffEq.jl it's always an opt-in (and maybe something we can make into an extension), and I think the same would need to be done here.

Do you have a link to the docs for that? We can have it be consistent here

avik-pal avatar Jan 19 '24 03:01 avik-pal

It's not documented well, and it's used in a very different way. It's just in some methods you can set threads=PolyesterThreads(). We should highlight it in the docs and make it into a package extension though.

ChrisRackauckas avatar Jan 19 '24 07:01 ChrisRackauckas