DiffEqGPU.jl GPU Parallel Ensemble simulation with random time span

Hi, for my application I would like to simulate a large number of trajectories (1000 to 20000) of the same system with different parameters but also with different time spans for a Monte Carlo simulation. However when attempting this, I get an error from DiffEqGPU.jl ERROR: LoadError: AssertionError: all((p->begin #= /home/tangui/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:278 =# p.tspan == (probs[1]).tspan end), probs) I understand that the function checks if all problems have the same tspan, but why is that ? I guess this has to do with some of the way the problem is broadcasted to the GPU under some fixed array shape. I think I could greatly benefit from switching to computing on GPU, especially for the reduced precision performance so DiffEqGPU seemed to be a good fit. Many thanks ! Here is the code to reproduce the error :

using DiffEqGPU, OrdinaryDiffEq

# global variables to be set from python side through Julia "Main" namespace
N_real = 5000
N_grid = 256
T = 150+273
m87 = 1.44316060e-25
k_B = 1.38064852e-23
window = 2.5e-3
Gamma = 1.0 + 0*im
Omega13 = 1.0 + 0*im
Omega23 = 1.0 + 0*im
gamma21tilde = 1.0 + 0*im
gamma31tilde = 1.0 + 0*im
gamma32tilde = 1.0 + 0*im
waist = 1.0e-3 + 0*im
r0 = 1.0 + 0*im
x0 = ComplexF32[5/8 + 0*im, 3/8+ 0*im, 0+ 0*im, 0+ 0*im, 0+ 0*im, 0+ 0*im, 0+ 0*im, 0+ 0*im]
k = Float32(2*pi/780.241e-9)
v = 40.0f0

function choose_points()::Tuple{Int32, Int32, Int32, Int32}
    edges = Array{Tuple{Int32, Int32}, 1}(undef, 4*N_grid)
    for i in 1:N_grid
        edges[i] = (1, i)
        edges[i+N_grid] = (N_grid, i)
        edges[i+2*N_grid] = (i, 1)
        edges[i+3*N_grid] = (i, N_grid)
    end
    iinit, jinit = edges[rand(1:length(edges), 1)][1]
    ifinal, jfinal = iinit, jinit
    cdtn = (ifinal == iinit) || (jfinal == jinit)
    while cdtn
        ifinal, jfinal = edges[rand(1:length(edges), 1)][1]
        cdtn = (ifinal == iinit) || (jfinal == jinit)
    end
    return (iinit, jinit, ifinal, jfinal)
end

function draw_vz(v::Float32)::Float32
    vz = abs(2*v)
    while abs(vz) > abs(v)
        vz = randn()*sqrt(k_B*T/m87)
    end
    return vz
end

function f!(dy::Array{ComplexF32, 1}, x::Array{ComplexF32, 1},
            p::Array{ComplexF32, 1}, t::Float32)
    v, u0, u1, xinit, yinit = p[1], p[2], p[3], p[4], p[5]
    Gamma, Omega13, Omega23 = p[6], p[7], p[8]
    gamma21tilde, gamma31tilde, gamma32tilde = p[9], p[10], p[11]
    waist, r0 = p[12], p[13]
    r_sq = (xinit+u0*v*t - r0)^2 + (yinit+u1*v*t - r0)^2
    Om23 = Omega23 * exp(-r_sq/(2*waist*waist))
    Om13 = Omega13 * exp(-r_sq/(2*waist*waist))
    dy[1] = (-Gamma/2)*x[1]-(Gamma/2)*x[2]+(im*conj(Om13)/2)*x[5]-(im*Om13/2)*x[6]+Gamma/2
    dy[2] = (-Gamma/2)*x[1]-(Gamma/2)*x[2]+(im*conj(Om23)/2)*x[7]-(im*Om23/2)*x[8]+Gamma/2
    dy[3] = -gamma21tilde*x[3]+(im*conj(Om23)/2)*x[5]-(im*Om13/2)*x[8]
    dy[4] = -conj(gamma21tilde)*x[4] - (im*Om23/2)*x[6] + (im*conj(Om13)/2)*x[7]
    dy[5] = im*Om13*x[1] + (im*Om13/2)*x[2] + (im*Om23/2)*x[3] - gamma31tilde*x[5]-im*Om13/2
    dy[6] = -im*conj(Om13)*x[1]-im*(conj(Om13)/2)*x[2]-(im*conj(Om23)/2)*x[4]-conj(gamma31tilde)*x[6]+im*conj(Om13)/2
    dy[7] = (im*Om23/2)*x[1]+im*Om23*x[2]+(im*Om13/2)*x[4]-gamma32tilde*x[7] - im*Om23/2
    dy[8] = (-im*conj(Om23)/2)*x[1]-im*conj(Om23)*x[2]-(im*conj(Om13)/2)*x[3]-conj(gamma32tilde)*x[8]+im*conj(Om23)/2
end

paths = [[] for i=1:N_real]
xs = [[] for i=1:N_real]
ts = [[] for i=1:N_real]
v_perps = zeros(Float32, N_real)
function prob_func(prob, i, repeat)
    # change seed of random number generation as the solver uses multithreading
    iinit, jinit, ifinal, jfinal = choose_points()
    vz = draw_vz(v)
    v_perp = sqrt(v^2 - vz^2)
    xinit = jinit*window/N_grid
    yinit = iinit*window/N_grid
    xfinal = jfinal*window/N_grid
    yfinal = ifinal*window/N_grid
    # velocity unit vector
    if v_perp != 0
        u0 = xfinal-xinit
        u1 = yfinal-yinit
        norm = hypot(u0, u1)
        u0 /= norm
        u1 /= norm
        new_tfinal = hypot((xfinal-xinit), (yfinal-yinit))/v_perp
    else
        u0 = u1 = 0
        new_tfinal = hypot((xfinal-xinit), (yfinal-yinit))/abs(vz)
    end
    new_p = ComplexF32[v_perp + 0*im, u0 + 0*im, u1 + 0*im, xinit + 0*im, yinit + 0*im,
             Gamma, Omega13, Omega23, gamma21tilde, gamma31tilde - im*k*vz,
             gamma32tilde - im*k*vz, waist, r0]
    new_tspan = (0.0f0, Float32(new_tfinal))
    tsave = collect(LinRange{Float32}(0.0f0, new_tfinal, 1000))
    remake(prob, p=new_p, tspan=new_tspan, saveat=tsave)
end
# instantiate a problem
p = ComplexF32[1.0 + 0*im, 1.0 + 0*im, 1.0 + 0*im, 1.0 + 0*im, 1.0 + 0*im,
     Gamma, Omega13,
     Omega23, gamma21tilde,
     gamma31tilde - im*k*0.0,
     gamma32tilde - im*k*0.0, waist, r0]
tspan = (0.0f0, 1.0f0)
tsave = collect(LinRange{Float32}(0.0f0, 1.0f0, 1000))
prob = ODEProblem{true}(f!, x0, tspan, p, saveat=tsave)
ensembleprob = EnsembleProblem(prob, prob_func=prob_func)
@time sol = solve(ensembleprob, BS3(), EnsembleGPUArray(), trajectories=N_real)

Sep 08 '21 08:09 taladjidi

Yeah, this won't work in the current design, but there's a new version hopefully coming soon.

Sep 08 '21 11:09 ChrisRackauckas

Hi,

I just ran into the same issue here: I'm trying to generate a lot of trajectories, with the start of integration t₀ randomly distributed over a timespan [t_start, t_stop]. And to avoid aliasing in space, I need to integrate the trajectories with saveat at points t_sample + k*Δt. with k in [0...n] and t_sample random in [t₀, t₀ + dt]. (i.e.: the integration time points are evenly separated by dt, but the first sample is randomly offset after launch. And launch is random over the full timespan I want to study)

Any update on this issue ?

Thanks

Mar 24 '22 14:03 nabajour

The new version should go online by the end of April and it's like 100x faster.

Mar 27 '22 12:03 ChrisRackauckas

Great news ! Thanks for the amazing work :)

Mar 27 '22 19:03 taladjidi

I'm glad to hear this as I'm also trying to integrate with varying time spans! In my case, each trajectory's time-span is uniquely determined by the prob_func and there can be large variation between them. Looking forward to the new version:)

Mar 30 '22 02:03 StuartBenjamin

Hi, I just saw that the new version came up a few days ago. Unfortunately the problem still persists ... The random time spans do not work. Do you see any change in the forseeable future ? Thanks !

Jul 05 '22 08:07 taladjidi

The new solver exists and this feature is possible with it, but it's just not exposed to the front end right now. We're working on optimizing it more and callback support first.

Jul 09 '22 08:07 ChrisRackauckas

Great news, do you have any idea of the time frame when it will be release ? This will actually orient a lot my research. In any case thanks again for the great work !

Jul 09 '22 08:07 taladjidi

"Soon"ish, but we have another paper due this week so that takes priority before we come back to this 😅

Jul 12 '22 01:07 ChrisRackauckas

@utkarsh530 can you address this with EnsembleGPUKernel?

Dec 20 '22 09:12 ChrisRackauckas

Hi Chris, I just saw your message, I will try a EnsembleGPUKernel implementation during my lunch break, and post the results here (I hope it can help !).

Dec 20 '22 10:12 taladjidi

Using EnsembleGPUKernel yields the same error. I guess I would have more to win to fix the same tspan for all my problems (the biggest one) and put everything on the GPU rather than trying to gain by tailoring the tspan for all problems. While I understand my use case might be too specific, maybe would it be possible to just add a warning in the documentation that all problems need to share the same tspan ? In any case, many thanks again for the help and devotion !

Dec 20 '22 12:12 taladjidi

EnsembleGPUKernel doesn't do this yet 😅. But in theory it could get added, so I tagged @utkarsh530 about it. Not ready to close, but probably close.

Dec 20 '22 14:12 ChrisRackauckas

I think it’s a bit tricky to add this feature, because of different sized arrays for each solve, which might not be good with CuMatrix which has all ts,us stored in it. However, I can see it will work with save_everystep == false && saveat === nothing or having constant size of ts (where the solution is being saved).

Dec 20 '22 16:12 utkarsh530

DiffEqGPU.jl DiffEqGPU.jl copied to clipboard

GPU Parallel Ensemble simulation with random time span

DiffEqGPU.jl
DiffEqGPU.jl copied to clipboard