DiffEqGPU.jl
DiffEqGPU.jl copied to clipboard
Out of Dynamic GPU memory in EnsembleGPUKernel for higher number of threads when using ContinuousCallback
Hello, I was testing the new updates to Terminete! with EnsembleGPUKernel. It Works fine with DiscreteCallback, however when using ContinuousCallback I still have the problem, Out of Dynamic GPU memory in EnsembleGPUKernel for higher number of threads. I attach the code used
using StaticArrays
using CUDA
using DiffEqGPU
using NPZ
using OrdinaryDiffEq
using Plots
"""
pot_central(u,p,t)
u=[x,dx,y,dy]
p=[k,m]
"""
function pot_central(u,p,t)
r3 = ( u[1]^2 + u[3]^2 )^(3/2)
du1 = u[2] # u[2]= dx
du2 = -( p[1]*u[1] ) / ( p[2]*r3 )
du3 = u[4] # u[4]= dy
du4 = -( p[1]*u[3] ) / ( p[2]*r3 )
return SVector{4}(du1,du2,du3,du4)
end
T = 100.0
k = 1.0
m = 1.0
trajectories = 5_000
u_rand = convert(Array{Float64},npzread("IO_GPU/IO_u0.npy"))
u0 = @SVector [2.0; 2.0; 1.0; 1.5]
p = @SVector [k,m]
tspan = (0.0,T)
prob = ODEProblem{false}(pot_central,u0,tspan,p)
prob_func = (prob,i,repeat) -> remake(prob, u0 = SVector{4}(u_rand[i,:]).*u0 + @SVector [1.0;1.0;1.0;1.0] )
Ensemble_Problem = EnsembleProblem(prob,prob_func=prob_func,safetycopy=false)
function condition(u,t,integrator)
R2 = @SVector [4.5,5_000.0] # R2=[Rmin2,Rmax2]
r2 = u[1]*u[1] + u[3]*u[3]
(R2[2] - r2)*(r2 - R2[1])#< 0.0
end
affect!(integrator) = terminate!(integrator)
gpu_cb = ContinuousCallback(condition, affect!;save_positions=(false,false),rootfind=true,interp_points=0,abstol=1e-7,reltol=0)
#gpu_cb = DiscreteCallback(condition, affect!;save_positions=(false,false))
CUDA.@time sol= solve(Ensemble_Problem,
GPUTsit5(),
#GPUVern7(),
#GPUVern9(),
EnsembleGPUKernel(),
trajectories = trajectories,
batch_size = 10_000,
adaptive = false,
dt = 0.01,
save_everystep = false,
callback = gpu_cb,
merge_callbacks = true
)
What GPU? A100? Is it just the memory scaling? Is it fine with a higher dt?
The GPU is A30. This is the error that comes out for trajectories = 50_000 and dt=0.1
Output excedes the [size limit]. Open the full output data [in a text editor]
ERROR: Out of Dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
...
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
Output exceeds the [size limit]. Open the full output data [in a text editor]
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
...
Run Julia on debug level 2 for device stack traces. ERROR: a exception was thrown during kernel execution. Run Julia on debug level 2 for device stack traces. ERROR: a exception was thrown during kernel execution. Run Julia on debug level 2 for devic
Output exceeds the [size limit]. Open the full output data [in a text editor]
e stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
...
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug lev
Output exceeds the [size limit]. Open the full output data [in a text editor]
ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes) ERROR: Out of dynamic GPU memory (trying to allocate 912 bytes)
...
ERROR: a exception was thrown during kernel execution. Run Julia on debug level 2 for device stack traces. ERROR: a exception was thrown during kernel execution. Run Julia on debug level 2 for device stack traces.
Output exceeds the [size limit]. Open the full output data [in a text editor]
ERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 foERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 foERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 ERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 ERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 ERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 fERROR: a (null) was thrown during kernel execution.
...
Run Julia on debug level 2 foERROR: a (null) was thrown during kernel execution.
Run Julia on debug level 2 foERROR: a (null) was thrown during kernel execution.
Run Julia on debug ERROR: a (null) was thrown during kernel execution.
Run Julia on debug ERROR: a (null) was thrown during kernel execution.
Run Julia on debug ERROR: a (nu
Output exceeds the [size limit]. Open the full output data [in a text editor]
ll) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debugERROR: a (null) was thrown during kernel execution. Run Julia on debugERROR: a (null) was thrown during kernel execution. Run Julia on debugERROR: a (null) was thrown during kernel execution. Run Julia on debugERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debug ERROR: a (null) was thrown during kernel execution. Run Julia on debugERROR: a (null) was thrown during kernel execution. Run Julia on debugERROR: a (null) was thrown during kernel execution. Run Julia on debugERROR: a (null) was thrown during kernel execution. Run Julia on debugERROR: a (null) was thrown during kernel execution. Run Julia on debugERROR: a (null) was thrown during kernel execution.
...
Run Julia on debug level 2 for device staERROR: a (null) was thrown during kernel execution. Run Julia on debug level 2 for device staERROR: a (null) was thrown during kernel execution. Run Julia on debug level 2 for device stERROR: a (null) was thrown during kernel execution. Run Julia on debug level 2 for device stERROR: a (null) was thrown during kernel execution. Run Julia on debug level 2 for devic
Excessive output truncated after 542774 bytes.
Output exceeds the [size limit]. Open the full output data [in a text editor]
KernelException: exception thrown during kernel execution on device NVIDIA A30
Stacktrace:
[1] check_exceptions()
@ CUDA ~/.julia/packages/CUDA/Ey3w2/src/compiler/exceptions.jl:34
[2] synchronize(stream::CuStream; blocking::Nothing)
@ CUDA ~/.julia/packages/CUDA/Ey3w2/lib/cudadrv/stream.jl:134
[3] synchronize
@ ~/.julia/packages/CUDA/Ey3w2/lib/cudadrv/stream.jl:121 [inlined]
[4] (::CUDA.var"#185#186"{SVector{4, Float64}, Matrix{SVector{4, Float64}}, Int64, CuArray{SVector{4, Float64}, 2, CUDA.Mem.DeviceBuffer}, Int64, Int64})()
@ CUDA ~/.julia/packages/CUDA/Ey3w2/src/array.jl:420
[5] #context!#63
@ ~/.julia/packages/CUDA/Ey3w2/lib/cudadrv/state.jl:164 [inlined]
[6] context!
@ ~/.julia/packages/CUDA/Ey3w2/lib/cudadrv/state.jl:159 [inlined]
[7] unsafe_copyto!(dest::Matrix{SVector{4, Float64}}, doffs::Int64, src::CuArray{SVector{4, Float64}, 2, CUDA.Mem.DeviceBuffer}, soffs::Int64, n::Int64)
@ CUDA ~/.julia/packages/CUDA/Ey3w2/src/array.jl:406
[8] copyto!
@ ~/.julia/packages/CUDA/Ey3w2/src/array.jl:360 [inlined]
[9] copyto!
@ ~/.julia/packages/CUDA/Ey3w2/src/array.jl:364 [inlined]
[10] copyto_axcheck!(dest::Matrix{SVector{4, Float64}}, src::CuArray{SVector{4, Float64}, 2, CUDA.Mem.DeviceBuffer})
@ Base ./abstractarray.jl:1127
[11] Array
@ ./array.jl:626 [inlined]
...
@ ~/.julia/packages/CUDA/Ey3w2/src/utilities.jl:25 [inlined]
[18] top-level scope
@ ~/.julia/packages/CUDA/Ey3w2/src/pool.jl:490 [inlined]
[19] top-level scope
@ ~/FAMAF/Beca_CIN_Trabajo_Final/skymap/GPU_Julia/pot_central_GPU_Float64.ipynb:0
Smaller batches or higher dt? Did you calculate out the batch memory size requirement?
For trajectories=5_000 and dt=0.1, the first time I ran the code it worked, but the second time I get the error
Using DiscreteCallback, I tested it with trajectories=10_000_000 and dt=0.01 and it Works fine. In version 1.24 of the library I had the same error.
It also fails for trajectories = 5_000 dt=0.1 and batch_size = 1_000
This happens due to an allocation within a kernel (in the case of StaticArrays-code typically due to escape analysis going wrong). You can spot it by prefixing code that launches kernels with @device_code_llvm dump_module=true and looking for calls to @gpu_gc_pool_alloc or @gpu_malloc:
julia> @device_code_llvm dump_module=true solve(Ensemble_Problem,
GPUTsit5(),
#GPUVern7(),
#GPUVern9(),
EnsembleGPUKernel(),
trajectories = trajectories,
batch_size = 10_000,
adaptive = false,
dt = 0.01,
save_everystep = false,
callback = gpu_cb,
merge_callbacks = true
)
; @ /home/tim/Julia/depot/packages/DiffEqGPU/JlHvl/src/perform_step/gpu_tsit5_perform_step.jl:85 within `tsit5_kernel`
; ┌ @ /home/tim/Julia/depot/packages/DiffEqGPU/JlHvl/src/integrators/types.jl:320 within `gputsit5_init`
; │┌ @ /home/tim/Julia/depot/packages/DiffEqGPU/JlHvl/src/integrators/types.jl:13 within `GPUTsit5Integrator`
%31 = call fastcc {}* @gpu_gc_pool_alloc([1 x i64] %state, i64 912)