KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
Function calls in kernels not called
using KernelAbstractions
@kernel function kernel(v)
I = @index(Global, Linear)
@show I # loops over 1:10
end
function inlined_function(v, I)
@show I
end
@kernel function nested_kernel(v)
I = @index(Global, Linear)
@show I # I only loop over 1:1
inlined_function(v, I)
end
v = ones(10)
wait(kernel(CPU())(v, ndrange = 10))
wait(nested_kernel(CPU())(v, ndrange = 10))
Edit: Updated with working code.
Ahhh, this makes sense... The error
terminates the kernel execution.
What happens if you add a wait
on nested_kernel(CPU())(v, ndrange = 10)
, eg.
wait(nested_kernel(CPU())(v, ndrange = 10))
The inlined_function
will only be called once and when we hit the error execution will stop.
Mmh, you are right. I missed the wait when creating that MWE. Now I'm not sure what happened. This code works now. Thanks!
I updated the code above. Is that the way to access the indices in the functions? If I pass __ctx__
, it fails with
function inlined_function(v, __ctx__)
I = @index(Global, Linear)
@show I
end
it errors with
nested task error: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, CartesianIndex{1}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}})
Hm, I think we can make it work. But we need to change the way the macros act slightly...
I am not a big fan of supporting truly nested kernel functions since the semantics get complicated. (In CUDA you would get a sub-grid), but I think we might be able to pass the context further down.