DiffEqFlux.jl Regularization for faster neural ODEs

Regularization for faster neural ODEs

Open ChrisRackauckas opened this issue 4 years ago • 1 comments

https://arxiv.org/abs/2007.04504 from @jessebett . Have you thought about using TaylorSeries.jl for a Julia implementation?

Jul 15 '20 00:07 ChrisRackauckas

Yes! Early on in the project I was using TaylorSeries.jl for proof of concept. We were having a few issues with it when we decided to use JAX (https://github.com/google/jax/issues/520#). I believe I described some of those issues in a Zygote issue https://github.com/FluxML/Zygote.jl/issues/130.

My memory of the issue was that TaylorSeries.jl is highly scalar-oriented. It does not handle vector-valued, univariate Taylor series well. That is x0 + x1 * t + 1/2 * x2 * t^2 where x0, x1, x2 are vectors. On a previous JuliaML videocall we discussed structs-of-arrays vs arrays-of-structs, and that in principle there shouldn't be an issue. But at the time TaylorSeries was requiring us to represent as an array of Taylor numbers, not a single "TaylorArray". Making this work with Zygote, and Flux NN's, right around the transition from Tracker to Zygote... At the time the Taylor primitives (or "chain rules" per @oxinabox) relied heavily on in-place mutation for efficiency. Zygote didn't like that. There was too much friction at the interface with DiffEqFlux, Zygote mutation branch, and then also TaylorSeries.jl when I last used it seriously. I don't know how much has changed since.

Since those early days of the project we learned a lot more about Taylor AD. I am confident that nice implementations of forward mode can be extended to support Taylor mode. Per Griewank's 22 and 23 Rules of AD:

22 Forward differentiation can be encapsulated in Taylor arithmetic 23 What works for first derivatives is fit to yield higher derivatives by extension to Taylor arithmetic

So, I'm not sure about TaylorSeries.jl. To be clear there is much to gain from TaylorSeries.jl in terms of API, it's an awesome package. But the backend, for this kind of project, should come from the perspective of vector-first, forward-mode AD.

Jul 15 '20 02:07 jessebett

DiffEqFlux.jl DiffEqFlux.jl copied to clipboard

Regularization for faster neural ODEs

DiffEqFlux.jl
DiffEqFlux.jl copied to clipboard