Linus

Germany Please don't steal my API keys

Results 37 comments of


                                            Linus

trafficstars

Continuous-adjoint methods for diagonal-noise SDEs scale in the square of number of dimensions

Executing the MWE on SciMLSensitvitiy#master now yields this: ``` julia> println(Zygote.gradient(ps -> loss(ps), ps_)) ERROR: Scalar indexing is disallowed. Invocation of getindex resulted in scalar indexing of a GPU array....

Continuous-adjoint methods for diagonal-noise SDEs scale in the square of number of dimensions

Same error on CPU: ``` julia> println(Zygote.gradient(ps -> loss(ps), ps_)) ERROR: BoundsError: attempt to access 12×1 ComponentMatrix{Float32, Matrix{Float32}, Tuple{Axis{(ps_drift = ViewAxis(1:6, Axis(weight = ViewAxis(1:4, ShapedAxis((2, 2), NamedTuple())), bias = ViewAxis(5:6,...

Continuous-adjoint methods for diagonal-noise SDEs scale in the square of number of dimensions

I'm pretty sure I can solve this, it seems like `stack` on a `ComponentVector` isn't behaving as expected

Continuous-adjoint methods for diagonal-noise SDEs scale in the square of number of dimensions

> Zygote has a bug here that's easy to workaround: > > ```julia > using Lux, Zygote, DifferentialEquations, ComponentArrays, Random, SciMLSensitivity, LinearAlgebra > > p = [1.5, 1.0, 3.0, 1.0]...

Continuous-adjoint methods for diagonal-noise SDEs scale in the square of number of dimensions

> Zygote has a bug here What is the bug exactly? Can we fix it? This workaround is O(n^2)

Continuous-adjoint methods for diagonal-noise SDEs scale in the square of number of dimensions

< deleted because I had an incorrect theory here - see below >

Continuous-adjoint methods for diagonal-noise SDEs scale in the square of number of dimensions

hmmm BUT we need to perturb this correctly with the noise. I can't get behind how torchsde / diffrax are doing this right now...

Continuous-adjoint methods for diagonal-noise SDEs scale in the square of number of dimensions

In `torchsde`, they never actually define `g` in the adjoint SDE, only define `g_prod` which is the product between `g` and the noise. So compare the implementation for EulerHeun: ```...

Continuous-adjoint methods for diagonal-noise SDEs scale in the square of number of dimensions

update: there's a trivial solution!! > ``` > # how to multiply tmp2 with dW such that dgrad * dW == tmp2 (*) dW? > # how to multiply tmp1...

Continuous-adjoint methods for diagonal-noise SDEs scale in the square of number of dimensions

this of course gives us quite a performance boost: ``` m = 10000 (...) function f(u, p, t) [p[1] * x - p[2] * t + p[3] * p[4] *...

‹
1
2
3
4
›