KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

Function calls in kernels not called

Open michel2323 opened this issue 3 years ago • 3 comments

using KernelAbstractions

@kernel function kernel(v)
	I = @index(Global, Linear)
	@show I # loops over 1:10

end

function inlined_function(v, I)
    @show I
end

@kernel function nested_kernel(v)
	I = @index(Global, Linear)
	@show I # I only loop over 1:1

	inlined_function(v, I)
end

v = ones(10)
wait(kernel(CPU())(v, ndrange = 10))
wait(nested_kernel(CPU())(v, ndrange = 10))

Edit: Updated with working code.

michel2323 avatar Aug 25 '21 14:08 michel2323

Ahhh, this makes sense... The error terminates the kernel execution. What happens if you add a wait on nested_kernel(CPU())(v, ndrange = 10), eg.

wait(nested_kernel(CPU())(v, ndrange = 10))

The inlined_function will only be called once and when we hit the error execution will stop.

vchuravy avatar Aug 25 '21 15:08 vchuravy

Mmh, you are right. I missed the wait when creating that MWE. Now I'm not sure what happened. This code works now. Thanks!

I updated the code above. Is that the way to access the indices in the functions? If I pass __ctx__, it fails with

function inlined_function(v, __ctx__)
    I = @index(Global, Linear)
    @show I
end

it errors with

nested task error: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, CartesianIndex{1}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}})

michel2323 avatar Aug 26 '21 05:08 michel2323

Hm, I think we can make it work. But we need to change the way the macros act slightly...

I am not a big fan of supporting truly nested kernel functions since the semantics get complicated. (In CUDA you would get a sub-grid), but I think we might be able to pass the context further down.

vchuravy avatar Aug 26 '21 09:08 vchuravy