AbstractDifferentiation.jl icon indicating copy to clipboard operation
AbstractDifferentiation.jl copied to clipboard

Planned backends to implement

Open sethaxen opened this issue 3 years ago • 14 comments

We should add backends for the following AD/FD packages:

  • [x] ForwardDiff
  • [x] ReverseDiff
  • [x] FiniteDifferences
  • [ ] all ChainRules-supporting ADs (see #11, #39)
  • [ ] FiniteDiff
  • [x] Tracker
  • [ ] Enzyme (#84)
  • [ ] Batched Zygote (https://github.com/JuliaDiff/AbstractDifferentiation.jl/issues/40#issuecomment-1029987127)
  • [ ] SparseDiffTools
  • [ ] Symbolics

sethaxen avatar Jan 26 '22 11:01 sethaxen

Can you add Yota also ?

AriMKatz avatar Jan 26 '22 11:01 AriMKatz

Yota is ChainRules-compatible, so it should be covered with the others.

sethaxen avatar Jan 26 '22 11:01 sethaxen

Make sure to add both Enzyme forward and reverse modes!

wsmoses avatar Jan 26 '22 18:01 wsmoses

Will do! Should I start with the public API? @frankschae said you had mentioned we might want to use some internal functions (he pointed me to https://github.com/wsmoses/Enzyme.jl/blob/2ce81ffa8f56c5bf44a4d85234c2110fa9d6eb0a/src/compiler.jl#L1745)

sethaxen avatar Jan 27 '22 08:01 sethaxen

I might not go quite that low level to save yourself some common LLVM setup, but probably using the thunk level (https://github.com/wsmoses/Enzyme.jl/blob/2ce81ffa8f56c5bf44a4d85234c2110fa9d6eb0a/src/compiler.jl#L2700) which has options for "combined" augmented forward pass+gradient, an augmented forward pass (storing values from the original function that need preservation), a standalone gradient (just running the reverse, using the stored values from an augmented forward pass), and forward mode AD.

This is used, for example, to generate the high-level autodiff/fwddiff routines (https://github.com/wsmoses/Enzyme.jl/blob/2ce81ffa8f56c5bf44a4d85234c2110fa9d6eb0a/src/Enzyme.jl#L173) and is currently the highest-level point that exposes "split mode" [e.g. the split augmented forward pass and standalone gradient]

wsmoses avatar Jan 27 '22 18:01 wsmoses

I would like to add a "batch" version of Zygote as a backend which falls back on Zygote except for jacobian where the pullback is called with all the bases simultaneously (i.e. pb(I) where I is the identity matrix). This can be useful to preserve sparsity of Jacobians if all the rules are written in a way that preserves sparsity.

mohamed82008 avatar Feb 04 '22 13:02 mohamed82008

And a SparseDiffTools backend to optimise for sparsity structure

mohamed82008 avatar Feb 04 '22 13:02 mohamed82008

I would like to add a "batch" version of Zygote as a backend which falls back on Zygote except for jacobian where the pullback is called with all the bases simultaneously (i.e. pb(I) where I is the identity matrix). This can be useful to preserve sparsity of Jacobians if all the rules are written in a way that preserves sparsity.

Is this a feature Zygote actually supports, or just something that sometimes works?

sethaxen avatar Feb 04 '22 14:02 sethaxen

It requires that the function being differentiated has independent actions on each column. For example, a neural network satisfies this.

ChrisRackauckas avatar Feb 04 '22 14:02 ChrisRackauckas

or just something that sometimes works?

Something that sometimes works. The goal is to make it easy to define a sparse Jacobian in a rrule and then get it back when calling Zygote.jacobian.

mohamed82008 avatar Feb 04 '22 15:02 mohamed82008

Is adding Symbolics.jl also planned?

In my field (control theory), symbolic differentiation is almost exclusively used since it gives speed when derivatives need to be calculated multiple times due to a lack of overhead of logic from the ADs calculating the forward pass and allocations. This is also the case for machine learning with constant graph, which can also benefit from this when common sub-expression elimination (cse) from Symbolics.jl is fully functional.

Calculating the derivative would happen by symbolically tracing the function and generating the derivative/gradient/jacobian function, then passing the inputs to the function.

This is useful when caches are added to this package, for Symbolics.jl the cache would just be the generated derivative function, resulting in no overhead in calculating the derivative.

JTaets avatar Feb 07 '22 11:02 JTaets

I think it would be good to support this. As you say, this would require support for caching. See #41.

sethaxen avatar Feb 07 '22 11:02 sethaxen

Via GitHub advanced search, I found some other AD packages as well:

  • gdalle/ImplicitDifferentiation.jl
  • avigliotti/AD4SM.jl
  • JuliaDiff/TaylorDiff.jl
  • abap34/JITrench.jl
  • sshin23/MadDiff.jl

https://github.com/search?l=&o=desc&q=Automatic+Differentiation+stars%3A%3E10+pushed%3A%3E2022-01-01+language%3AJulia&s=stars&type=Repositories

prbzrg avatar Feb 07 '23 03:02 prbzrg

gdalle/ImplicitDifferentiation.jl

Actually, ImplicitDifferentiation.jl now uses AbstractDifferentiation.jl under the hood, to call any AD package as a backend. Can it be a backend itself? I don't think it's a good idea, so no need to include it on the list :)

gdalle avatar May 24 '23 07:05 gdalle