DiffEqGPU.jl
DiffEqGPU.jl copied to clipboard
GPU Parallel Ensemble Simulations with generic linear algebra
Hello,
I'm trying to solve and ODE with my GPU. I tested the example code present in the DiffEqGPU.js repository, and it worked fine.
I have the following function to implement in the ODE Problem
function drho_dt(rho, p, t)
A, w_l= p
return ( L + A * sin(w_l * t) * L_t ) * rho
end
or equivalently
function drho_dt(drho, rho, p, t)
A, w_l = p
mul!(drho, L + A * sin(w_l * t) * L_t, rho)
return nothing
end
where A and w_l are two numbers, while L and L_t are two global matrices. Following the documentation, I used the following code to run a Parallel Ensemble:
A = 0.05
w_l = 1.0
p = [A, w_l]
tspan = (0.0, 10.0 / gam_q)
w_l_l = LinRange{Float64}(0.1, 1.9, 10)
function prob_func(prob,i,repeat)
remake(prob,p=[prob.p[1], w_l_l[i]])
end
prob = ODEProblem(drho_dt, rho0_vec, tspan, p)
ensemble_prob = EnsembleProblem(prob, prob_func=prob_func, safetycopy=false)
@time sim = solve(ensemble_prob, BS3(), METHOD!!, trajectories=length(w_l_l))
where METHOD!! can be one of the methods, e.g. EnsembleSerial(), EnsembleThreads(), EnsembleCPUArray() or EnsembleGPUArray().
The problem is divided in two cases:
- When
LandL_tare sparse matrices. In this case all the methods work, except forEnsembleGPUArray(). In this case the error is
InvalidIRError: compiling kernel gpu_gpu_kernel_oop(Cassette.Context{nametype(CUDACtx), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Nothing, KernelAbstractions.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel_oop), typeof(drho_dt), CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{Float64, 1}, Float64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
[1] overdub
@ ./In[8]:62
[2] macro expansion
@ ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:25
[3] overdub
@ ~/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
[4] overdub
@ ~/.julia/packages/Cassette/N5kbV/src/overdub.jl:0
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:27
[2] overdub
@ ~/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
[3] overdub
@ ~/.julia/packages/Cassette/N5kbV/src/overdub.jl:0
Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Nothing, KernelAbstractions.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel_oop), typeof(drho_dt), CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{Float64, 1}, Float64}}}, args::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/wHjqZ/src/validation.jl:111
[2] macro expansion
@ ~/.julia/packages/GPUCompiler/wHjqZ/src/driver.jl:319 [inlined]
[3] macro expansion
@ ~/.julia/packages/TimerOutputs/ZQ0rt/src/TimerOutput.jl:236 [inlined]
[4] macro expansion
@ ~/.julia/packages/GPUCompiler/wHjqZ/src/driver.jl:317 [inlined]
[5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/wHjqZ/src/utils.jl:62
[6] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/02Kjq/src/compiler/execution.jl:317
[7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/wHjqZ/src/cache.jl:89
[8] cufunction(f::typeof(Cassette.overdub), tt::Type{Tuple{Cassette.Context{nametype(CUDACtx), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Nothing, KernelAbstractions.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel_oop), typeof(drho_dt), CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{Float64, 1}, Float64}}; name::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/packages/CUDA/02Kjq/src/compiler/execution.jl:288
[9] macro expansion
@ ~/.julia/packages/CUDA/02Kjq/src/compiler/execution.jl:102 [inlined]
[10] (::KernelAbstractions.Kernel{CUDAKernels.CUDADevice, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.gpu_gpu_kernel_oop)})(::Function, ::Vararg{Any, N} where N; ndrange::Int64, dependencies::CUDAKernels.CudaEvent, workgroupsize::Int64, progress::Function)
@ CUDAKernels ~/.julia/packages/CUDAKernels/8wtKq/src/CUDAKernels.jl:194
[11] SciML/DifferentialEquations.jl#55
@ ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:414 [inlined]
[12] ODEFunction
@ ~/.julia/packages/SciMLBase/oTP8b/src/scimlfunctions.jl:334 [inlined]
[13] initialize!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5, true, CuArray{ComplexF64, 2}, Nothing, Float64, CuArray{Float64, 2}, Float64, Float64, Float64, Float64, Vector{CuArray{ComplexF64, 2}}, ODESolution{ComplexF64, 3, Vector{CuArray{ComplexF64, 2}}, Nothing, Nothing, Vector{Float64}, Vector{Vector{CuArray{ComplexF64, 2}}}, ODEProblem{CuArray{ComplexF64, 2}, Tuple{Float64, Float64}, true, CuArray{Float64, 2}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{CuArray{ComplexF64, 2}}, Vector{Float64}, Vector{Vector{CuArray{ComplexF64, 2}}}, OrdinaryDiffEq.Tsit5Cache{CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}}, OrdinaryDiffEq.DEOptions{Float64, Float64, Float64, Float64, PIController{Rational{Int64}}, typeof(DiffEqGPU.diffeqgpunorm), typeof(opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#12#18", DataStructures.BinaryMinHeap{Float64}, DataStructures.BinaryMinHeap{Float64}, Nothing, Nothing, Int64, Tuple{}, Tuple{}, Tuple{}}, CuArray{ComplexF64, 2}, ComplexF64, Nothing, OrdinaryDiffEq.DefaultInit}, cache::OrdinaryDiffEq.Tsit5Cache{CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/perform_step/low_order_rk_perform_step.jl:623
[14] __init(prob::ODEProblem{CuArray{ComplexF64, 2}, Tuple{Float64, Float64}, true, CuArray{Float64, 2}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::Tuple{}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Nothing, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/solve.jl:456
[15] #__solve#466
@ ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/solve.jl:4 [inlined]
[16] #solve_call#58
@ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:61 [inlined]
[17] solve_up(prob::ODEProblem{CuArray{ComplexF64, 2}, Tuple{Float64, Float64}, true, CuArray{Float64, 2}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{ComplexF64, 2}, p::CuArray{Float64, 2}, args::Tsit5; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
@ DiffEqBase ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:82
[18] #solve#59
@ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:70 [inlined]
[19] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::Tsit5, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{ComplexF64}, p::Matrix{Float64}; kwargs::Base.Iterators.Pairs{Symbol, DiffEqGPU.var"#12#18", Tuple{Symbol}, NamedTuple{(:unstable_check,), Tuple{DiffEqGPU.var"#12#18"}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:319
[20] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Iterators.Pairs{Symbol, DiffEqGPU.var"#12#18", Tuple{Symbol}, NamedTuple{(:unstable_check,), Tuple{DiffEqGPU.var"#12#18"}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:284
[21] macro expansion
@ ./timing.jl:287 [inlined]
[22] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:201
[23] #solve#61
@ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:96 [inlined]
[24] top-level scope
@ ./timing.jl:210 [inlined]
[25] top-level scope
@ ./In[9]:0
[26] eval
@ ./boot.jl:360 [inlined]
[27] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1094
- When
LandL_tare dense matrices. In this case not evenEnsembleCPUArray()works, but the others yes. The error for the GPU method should be the same. While the error for the CPU one is
TaskFailedException
nested task error: conversion to pointer not defined for Base.Experimental.Const{ComplexF64, 2}
Stacktrace:
[1] error(::String)
@ ./error.jl:33 [inlined]
[2] overdub
@ ./error.jl:33 [inlined]
[3] unsafe_convert(::Type{Ptr{ComplexF64}}, ::Base.Experimental.Const{ComplexF64, 2})
@ ./pointer.jl:67 [inlined]
[4] overdub
@ ./pointer.jl:67 [inlined]
[5] unsafe_convert(::Type{Ptr{ComplexF64}}, ::SubArray{ComplexF64, 1, Base.Experimental.Const{ComplexF64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
@ ./subarray.jl:427 [inlined]
[6] overdub
@ ./subarray.jl:427 [inlined]
[7] overdub
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/blas.jl:704 [inlined]
[8] overdub
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:544 [inlined]
[9] mul!(::Vector{ComplexF64}, ::Matrix{ComplexF64}, ::SubArray{ComplexF64, 1, Base.Experimental.Const{ComplexF64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::Bool, ::Bool)
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:66 [inlined]
[10] overdub
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:66 [inlined]
[11] mul!(::Vector{ComplexF64}, ::Matrix{ComplexF64}, ::SubArray{ComplexF64, 1, Base.Experimental.Const{ComplexF64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:275 [inlined]
[12] overdub
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:275 [inlined]
[13] overdub
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:51 [inlined]
[14] overdub(::Cassette.Context{nametype(CPUCtx), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{1}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Nothing, KernelAbstractions.var"##PassType#257", Nothing, Cassette.DisableHooks}, ::typeof(*), ::Matrix{ComplexF64}, ::SubArray{ComplexF64, 1, Base.Experimental.Const{ComplexF64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
@ Cassette ~/.julia/packages/Cassette/N5kbV/src/overdub.jl:0
[15] overdub
@ ./In[11]:62 [inlined]
[16] macro expansion
@ ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:25 [inlined]
[17] overdub
@ ~/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:265 [inlined]
[18] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{KernelAbstractions.CPU, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.cpu_gpu_kernel_oop)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}, args::Tuple{typeof(drho_dt), Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{Float64}, Float64}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
@ KernelAbstractions ~/.julia/packages/KernelAbstractions/Yy47c/src/cpu.jl:157
[19] __run(obj::KernelAbstractions.Kernel{KernelAbstractions.CPU, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.cpu_gpu_kernel_oop)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}, args::Tuple{typeof(drho_dt), Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{Float64}, Float64}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
@ KernelAbstractions ~/.julia/packages/KernelAbstractions/Yy47c/src/cpu.jl:130
[20] (::KernelAbstractions.var"#33#34"{Tuple{KernelAbstractions.NoneEvent}, Nothing, typeof(KernelAbstractions.__run), Tuple{KernelAbstractions.Kernel{KernelAbstractions.CPU, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.cpu_gpu_kernel_oop)}, Tuple{Int64}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}, Tuple{typeof(drho_dt), Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{Float64}, Float64}, KernelAbstractions.NDIteration.NoDynamicCheck}})()
@ KernelAbstractions ~/.julia/packages/KernelAbstractions/Yy47c/src/cpu.jl:22
Stacktrace:
[1] wait
@ ./task.jl:322 [inlined]
[2] wait
@ ~/.julia/packages/KernelAbstractions/Yy47c/src/cpu.jl:65 [inlined]
[3] wait
@ ~/.julia/packages/KernelAbstractions/Yy47c/src/cpu.jl:64 [inlined]
[4] (::DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)})(du::Matrix{ComplexF64}, u::Matrix{ComplexF64}, p::Matrix{Float64}, t::Float64)
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:414
[5] ODEFunction
@ ~/.julia/packages/SciMLBase/oTP8b/src/scimlfunctions.jl:334 [inlined]
[6] initialize!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5, true, Matrix{ComplexF64}, Nothing, Float64, Matrix{Float64}, Float64, Float64, Float64, Float64, Vector{Matrix{ComplexF64}}, ODESolution{ComplexF64, 3, Vector{Matrix{ComplexF64}}, Nothing, Nothing, Vector{Float64}, Vector{Vector{Matrix{ComplexF64}}}, ODEProblem{Matrix{ComplexF64}, Tuple{Float64, Float64}, true, Matrix{Float64}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Matrix{ComplexF64}}, Vector{Float64}, Vector{Vector{Matrix{ComplexF64}}}, OrdinaryDiffEq.Tsit5Cache{Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{ComplexF64}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{ComplexF64}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}}, OrdinaryDiffEq.DEOptions{Float64, Float64, Float64, Float64, PIController{Rational{Int64}}, typeof(DiffEqGPU.diffeqgpunorm), typeof(opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#12#18", DataStructures.BinaryMinHeap{Float64}, DataStructures.BinaryMinHeap{Float64}, Nothing, Nothing, Int64, Tuple{}, Tuple{}, Tuple{}}, Matrix{ComplexF64}, ComplexF64, Nothing, OrdinaryDiffEq.DefaultInit}, cache::OrdinaryDiffEq.Tsit5Cache{Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{ComplexF64}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/perform_step/low_order_rk_perform_step.jl:623
[7] __init(prob::ODEProblem{Matrix{ComplexF64}, Tuple{Float64, Float64}, true, Matrix{Float64}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::Tuple{}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Nothing, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/solve.jl:456
[8] #__solve#466
@ ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/solve.jl:4 [inlined]
[9] #solve_call#58
@ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:61 [inlined]
[10] solve_up(prob::ODEProblem{Matrix{ComplexF64}, Tuple{Float64, Float64}, true, Matrix{Float64}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::Matrix{ComplexF64}, p::Matrix{Float64}, args::Tsit5; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
@ DiffEqBase ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:82
[11] #solve#59
@ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:70 [inlined]
[12] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::Tsit5, ensemblealg::EnsembleCPUArray, I::UnitRange{Int64}, u0::Matrix{ComplexF64}, p::Matrix{Float64}; kwargs::Base.Iterators.Pairs{Symbol, DiffEqGPU.var"#12#18", Tuple{Symbol}, NamedTuple{(:unstable_check,), Tuple{DiffEqGPU.var"#12#18"}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:319
[13] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleCPUArray, I::UnitRange{Int64}; kwargs::Base.Iterators.Pairs{Symbol, DiffEqGPU.var"#12#18", Tuple{Symbol}, NamedTuple{(:unstable_check,), Tuple{DiffEqGPU.var"#12#18"}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:284
[14] macro expansion
@ ./timing.jl:287 [inlined]
[15] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleCPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:201
[16] #solve#61
@ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:96 [inlined]
[17] top-level scope
@ ./timing.jl:210 [inlined]
[18] top-level scope
@ ./In[11]:0
[19] eval
@ ./boot.jl:360 [inlined]
[20] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1094
Yes, I don't think it can generate generic linear algebra calls.
Did you try using StaticArrays?
Did you try using StaticArrays?
I'm using very large matrices (usually 1500x1500), this is the reason i use sparse matrices. However, also with a 100x100 matrix it tooks a lot of time to convert L to a Static Array
SMatrix{N, N}(Matrix(L))
That makes no sense to solve with EnsembleGPUArray. Read the DiffEqGPU documentation on the two forms of GPU problems. You're using the form for 100 ODEs or less, not the one for 1000 ODEs or more.
https://github.com/SciML/DiffEqGPU.jl#within-method-gpu-parallelism-with-direct-cuarray-usage
That's more applicable here.
Do you mean that I need to convert L and L_t into CuArray and than evolve each ODE serially with EnsembleSerial()? If it is the case, can i use EnsembleThreads() to parallelize that CUDA ODE again?
That would evolve the ODE in parallel on the GPU. You want to use EnsembleSerial unless you have multiple GPUs which you would want to use at the same time.
Ok perfect. Thank you for your help.
However I have the following question. I'm yet able to perform this ODE directly on the GPU, using the differential function
L = CUDA.CUSPARSE.CuSparseMatrixCSR( L )
L_t = CUDA.CUSPARSE.CuSparseMatrixCSR( L_t )
function f(rho, p, t)
A, w_l = p
rho_tmp = similar(rho0_vec)
return L * rho + CUDA.CUSPARSE.mv!('N', A * sin(w_l * t), L_t, rho, 0.0, rho_tmp, '0')
end
where this time L and L_t are CuSparse matrices, and they don't take up a lot of memory as the dense matrix form.
This method is a little bit faster than the CPU sparse one, but of course slower if I parallelize the CPU method in the Ensemble Simulation.
Whit this CUDA sparse method the GPU is loaded only up to the 10%, and so the memory. If I could be able to parallelize all the ODEs in the ensemble I think it would get a lot faster than the EnsembleThreads() method.
Why there is no way to implement this method for an EnsembleGPUArray() simulation?
To GPU it like this, you'd want to expand it out to its scalar form with something like ModelingToolkit, in order to then do the GPU codegen. EnsembleGPUArray is all about kernel code generation, so it's not using CUDA kernels like those from CUSPARSE since it's really made to generate nonlinear kernels.