DiffEqGPU.jl
DiffEqGPU.jl copied to clipboard
GPU Parallel Ensemble simulation with random time span
Hi, for my application I would like to simulate a large number of trajectories (1000 to 20000) of the same system with different parameters but also with different time spans for a Monte Carlo simulation. However when attempting this, I get an error from DiffEqGPU.jl
ERROR: LoadError: AssertionError: all((p->begin #= /home/tangui/.julia/packages/DiffEqGPU/Ibo20/src/DiffEqGPU.jl:278 =# p.tspan == (probs[1]).tspan end), probs)
I understand that the function checks if all problems have the same tspan, but why is that ? I guess this has to do with some of the way the problem is broadcasted to the GPU under some fixed array shape. I think I could greatly benefit from switching to computing on GPU, especially for the reduced precision performance so DiffEqGPU seemed to be a good fit.
Many thanks !
Here is the code to reproduce the error :
using DiffEqGPU, OrdinaryDiffEq
# global variables to be set from python side through Julia "Main" namespace
N_real = 5000
N_grid = 256
T = 150+273
m87 = 1.44316060e-25
k_B = 1.38064852e-23
window = 2.5e-3
Gamma = 1.0 + 0*im
Omega13 = 1.0 + 0*im
Omega23 = 1.0 + 0*im
gamma21tilde = 1.0 + 0*im
gamma31tilde = 1.0 + 0*im
gamma32tilde = 1.0 + 0*im
waist = 1.0e-3 + 0*im
r0 = 1.0 + 0*im
x0 = ComplexF32[5/8 + 0*im, 3/8+ 0*im, 0+ 0*im, 0+ 0*im, 0+ 0*im, 0+ 0*im, 0+ 0*im, 0+ 0*im]
k = Float32(2*pi/780.241e-9)
v = 40.0f0
function choose_points()::Tuple{Int32, Int32, Int32, Int32}
edges = Array{Tuple{Int32, Int32}, 1}(undef, 4*N_grid)
for i in 1:N_grid
edges[i] = (1, i)
edges[i+N_grid] = (N_grid, i)
edges[i+2*N_grid] = (i, 1)
edges[i+3*N_grid] = (i, N_grid)
end
iinit, jinit = edges[rand(1:length(edges), 1)][1]
ifinal, jfinal = iinit, jinit
cdtn = (ifinal == iinit) || (jfinal == jinit)
while cdtn
ifinal, jfinal = edges[rand(1:length(edges), 1)][1]
cdtn = (ifinal == iinit) || (jfinal == jinit)
end
return (iinit, jinit, ifinal, jfinal)
end
function draw_vz(v::Float32)::Float32
vz = abs(2*v)
while abs(vz) > abs(v)
vz = randn()*sqrt(k_B*T/m87)
end
return vz
end
function f!(dy::Array{ComplexF32, 1}, x::Array{ComplexF32, 1},
p::Array{ComplexF32, 1}, t::Float32)
v, u0, u1, xinit, yinit = p[1], p[2], p[3], p[4], p[5]
Gamma, Omega13, Omega23 = p[6], p[7], p[8]
gamma21tilde, gamma31tilde, gamma32tilde = p[9], p[10], p[11]
waist, r0 = p[12], p[13]
r_sq = (xinit+u0*v*t - r0)^2 + (yinit+u1*v*t - r0)^2
Om23 = Omega23 * exp(-r_sq/(2*waist*waist))
Om13 = Omega13 * exp(-r_sq/(2*waist*waist))
dy[1] = (-Gamma/2)*x[1]-(Gamma/2)*x[2]+(im*conj(Om13)/2)*x[5]-(im*Om13/2)*x[6]+Gamma/2
dy[2] = (-Gamma/2)*x[1]-(Gamma/2)*x[2]+(im*conj(Om23)/2)*x[7]-(im*Om23/2)*x[8]+Gamma/2
dy[3] = -gamma21tilde*x[3]+(im*conj(Om23)/2)*x[5]-(im*Om13/2)*x[8]
dy[4] = -conj(gamma21tilde)*x[4] - (im*Om23/2)*x[6] + (im*conj(Om13)/2)*x[7]
dy[5] = im*Om13*x[1] + (im*Om13/2)*x[2] + (im*Om23/2)*x[3] - gamma31tilde*x[5]-im*Om13/2
dy[6] = -im*conj(Om13)*x[1]-im*(conj(Om13)/2)*x[2]-(im*conj(Om23)/2)*x[4]-conj(gamma31tilde)*x[6]+im*conj(Om13)/2
dy[7] = (im*Om23/2)*x[1]+im*Om23*x[2]+(im*Om13/2)*x[4]-gamma32tilde*x[7] - im*Om23/2
dy[8] = (-im*conj(Om23)/2)*x[1]-im*conj(Om23)*x[2]-(im*conj(Om13)/2)*x[3]-conj(gamma32tilde)*x[8]+im*conj(Om23)/2
end
paths = [[] for i=1:N_real]
xs = [[] for i=1:N_real]
ts = [[] for i=1:N_real]
v_perps = zeros(Float32, N_real)
function prob_func(prob, i, repeat)
# change seed of random number generation as the solver uses multithreading
iinit, jinit, ifinal, jfinal = choose_points()
vz = draw_vz(v)
v_perp = sqrt(v^2 - vz^2)
xinit = jinit*window/N_grid
yinit = iinit*window/N_grid
xfinal = jfinal*window/N_grid
yfinal = ifinal*window/N_grid
# velocity unit vector
if v_perp != 0
u0 = xfinal-xinit
u1 = yfinal-yinit
norm = hypot(u0, u1)
u0 /= norm
u1 /= norm
new_tfinal = hypot((xfinal-xinit), (yfinal-yinit))/v_perp
else
u0 = u1 = 0
new_tfinal = hypot((xfinal-xinit), (yfinal-yinit))/abs(vz)
end
new_p = ComplexF32[v_perp + 0*im, u0 + 0*im, u1 + 0*im, xinit + 0*im, yinit + 0*im,
Gamma, Omega13, Omega23, gamma21tilde, gamma31tilde - im*k*vz,
gamma32tilde - im*k*vz, waist, r0]
new_tspan = (0.0f0, Float32(new_tfinal))
tsave = collect(LinRange{Float32}(0.0f0, new_tfinal, 1000))
remake(prob, p=new_p, tspan=new_tspan, saveat=tsave)
end
# instantiate a problem
p = ComplexF32[1.0 + 0*im, 1.0 + 0*im, 1.0 + 0*im, 1.0 + 0*im, 1.0 + 0*im,
Gamma, Omega13,
Omega23, gamma21tilde,
gamma31tilde - im*k*0.0,
gamma32tilde - im*k*0.0, waist, r0]
tspan = (0.0f0, 1.0f0)
tsave = collect(LinRange{Float32}(0.0f0, 1.0f0, 1000))
prob = ODEProblem{true}(f!, x0, tspan, p, saveat=tsave)
ensembleprob = EnsembleProblem(prob, prob_func=prob_func)
@time sol = solve(ensembleprob, BS3(), EnsembleGPUArray(), trajectories=N_real)
Yeah, this won't work in the current design, but there's a new version hopefully coming soon.
Hi,
I just ran into the same issue here: I'm trying to generate a lot of trajectories, with the start of integration t0 randomly distributed over a timespan [tstart, tstop]. And to avoid aliasing in space, I need to integrate the trajectories with saveat at points tsample + k*Δt. with k in [0...n] and tsample random in [t0, t0 + dt]. (i.e.: the integration time points are evenly separated by dt, but the first sample is randomly offset after launch. And launch is random over the full timespan I want to study)
Any update on this issue ?
Thanks
The new version should go online by the end of April and it's like 100x faster.
Great news ! Thanks for the amazing work :)
I'm glad to hear this as I'm also trying to integrate with varying time spans! In my case, each trajectory's time-span is uniquely determined by the prob_func and there can be large variation between them. Looking forward to the new version:)
Hi, I just saw that the new version came up a few days ago. Unfortunately the problem still persists ... The random time spans do not work. Do you see any change in the forseeable future ? Thanks !
The new solver exists and this feature is possible with it, but it's just not exposed to the front end right now. We're working on optimizing it more and callback support first.
Great news, do you have any idea of the time frame when it will be release ? This will actually orient a lot my research. In any case thanks again for the great work !
"Soon"ish, but we have another paper due this week so that takes priority before we come back to this 😅
@utkarsh530 can you address this with EnsembleGPUKernel?
Hi Chris, I just saw your message, I will try a EnsembleGPUKernel implementation during my lunch break, and post the results here (I hope it can help !).
Using EnsembleGPUKernel yields the same error. I guess I would have more to win to fix the same tspan for all my problems (the biggest one) and put everything on the GPU rather than trying to gain by tailoring the tspan for all problems. While I understand my use case might be too specific, maybe would it be possible to just add a warning in the documentation that all problems need to share the same tspan ? In any case, many thanks again for the help and devotion !
EnsembleGPUKernel doesn't do this yet 😅. But in theory it could get added, so I tagged @utkarsh530 about it. Not ready to close, but probably close.
I think it’s a bit tricky to add this feature, because of different sized arrays for each solve, which might not be good with CuMatrix which has all ts,us stored in it. However, I can see it will work with save_everystep == false && saveat === nothing or having constant size of ts (where the solution is being saved).