NonlinearSolve.jl
NonlinearSolve.jl copied to clipboard
Default Automatic Differentiation Choice
Forward AD
- [x] Non-Sparse Default AD should be
AutoPolyesterForwardDiff
(if that package is loaded). This will be similar to SimpleNonlinearSolve
Reverse AD
- For in-place problems, we default to AutoFiniteDiff. This is a really bad choice. We should default to: (conditional on the package being loaded)
- [ ]
AutoReverseDiff
(after implementing the corresponding version in SparseDiffTools) for non-GPU versions - [ ]
AutoEnzyme
for all other cases
- [ ]
Should we just depend on Polyseter and make it the default here?
AutoPolyesterForwardDiff (if that package is loaded)
OrdinaryDiffEq.jl depends on Polyester, and so you might have an odd interaction that some codes work better or worse depending on whether you have the ODE solver loaded, and this might be a little invisible to many users.
For in-place problems, we default to AutoFiniteDiff. This is a really bad choice. We should default to: (conditional on the package being loaded)
Why not Forward? Are you talking about a specific size of the Jacobian?
I am still debating on the default as polyesterforwarddiff, for the bruss we see a clear improvement but for the battery problem there is a slowdown. I need to investigate this a bit to verify it is not my code that is problematic.
Why not Forward? Are you talking about a specific size of the Jacobian?
We could construct the full jacobian and then compute the VJP but the default was based on the implementations available in SparseDiffTools and was not updated after that. Currently we maintain the JacobianOperator in house so we can easily switch that as well.
The problem is that if there are any other threads then it's not going to be a speedup since you'll lock the threads. This makes it pretty unsafe unless the user knows it's going to be using Polyester. That is why in OrdinaryDiffEq.jl it's always an opt-in (and maybe something we can make into an extension), and I think the same would need to be done here.
I think we should highlight it in documentation and tutorials much better than we do now, since indeed for any large enough problem it's a good idea, but it's hard to make something that bypasses hierarchical threading into a default.
We could construct the full jacobian and then compute the VJP but the default was based on the implementations available in SparseDiffTools and was not updated after that. Currently we maintain the JacobianOperator in house so we can easily switch that as well.
Oh you're talking about the default vjp, for some line searches?
Oh you're talking about the default vjp, for some line searches?
For some of the line searches and if you use a krylov method like LSMR requiring both $J^Tv$ and $Ju$
That is why in OrdinaryDiffEq.jl it's always an opt-in (and maybe something we can make into an extension), and I think the same would need to be done here.
Do you have a link to the docs for that? We can have it be consistent here
It's not documented well, and it's used in a very different way. It's just in some methods you can set threads=PolyesterThreads()
. We should highlight it in the docs and make it into a package extension though.