KernelAbstractions.jl
                                
                                
                                
                                    KernelAbstractions.jl copied to clipboard
                            
                            
                            
                        Function calls in kernels not called
using KernelAbstractions
@kernel function kernel(v)
	I = @index(Global, Linear)
	@show I # loops over 1:10
end
function inlined_function(v, I)
    @show I
end
@kernel function nested_kernel(v)
	I = @index(Global, Linear)
	@show I # I only loop over 1:1
	inlined_function(v, I)
end
v = ones(10)
wait(kernel(CPU())(v, ndrange = 10))
wait(nested_kernel(CPU())(v, ndrange = 10))
Edit: Updated with working code.
Ahhh, this makes sense... The error terminates the kernel execution.
What happens if you add a wait on nested_kernel(CPU())(v, ndrange = 10), eg.
wait(nested_kernel(CPU())(v, ndrange = 10))
The inlined_function will only be called once and when we hit the error execution will stop.
Mmh, you are right. I missed the wait when creating that MWE. Now I'm not sure what happened. This code works now. Thanks!
I updated the code above. Is that the way to access the indices in the functions? If I pass  __ctx__, it fails with
function inlined_function(v, __ctx__)
    I = @index(Global, Linear)
    @show I
end
it errors with
nested task error: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, CartesianIndex{1}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}})
                                    
                                    
                                    
                                
Hm, I think we can make it work. But we need to change the way the macros act slightly...
I am not a big fan of supporting truly nested kernel functions since the semantics get complicated. (In CUDA you would get a sub-grid), but I think we might be able to pass the context further down.