Valentin Churavy
Valentin Churavy
Hm, I think we can make it work. But we need to change the way the macros act slightly... I am not a big fan of supporting truly nested kernel...
Can you try https://github.com/JuliaGPU/KernelAbstractions.jl/pull/291?
Thanks, managed to reproduce!
So it is really weird: ### KernelAbstraction: ``` ld.param.u64 %rd10, [_Z17julia_gpu_kernel_7ContextI14__CUDACtx_Name16CompilerMetadataI10StaticSizeI4_1__E12DynamicCheckvv7NDRangeILi1E10StaticSizeI4_1__E10StaticSizeI4_1__EvvEEv14__PassType_422v12DisableHooksE12_gpu_kernel_1SI7Float32E13CuDeviceArrayI7Float32Li1E6GlobalE13CuDeviceArrayI7Float32Li1E6GlobalE7Float324Bool_param_2+8]; and.b64 %rd3, %rd10, 1099511627775; cvta.to.global.u64 %rd17, %rd3; ld.global.f32 %f6, [%rd17]; ``` ### CUDAnative pure ``` ld.param.u64 %rd3, [_Z22julia_cn_kernel__183791SI7Float32E13CuDeviceArrayI7Float32Li1E6GlobalE13CuDeviceArrayI7Float32Li1E6GlobalE7Float324Bool_param_2+8]; cvta.to.global.u64...
duplicate of #13, needs better docs though.
``` function vmap!(f, out, args...) @kernel function kernelF(f::F, out, args...) where F I = @index(Global, Linear) elems = ntuple(Val(length(args))) do i Base.@_inline_meta @inbounds args[i][I] end @inbounds out[I] = f(elems...) end...
I am off on vacation, so I won't partake in this discussion for the next two weeks. KA is build on top of Cassette so it can indeed change the...
I think the design space is still quite open. [Tullio.jl](https://github.com/mcabbott/Tullio.jl) is something like that for fans of Einstein notation. I have my own playground where I explore infrastructure ideas. I...
This is coming from CUDA.jl and there are two different cases: The first being a [`invoke`](https://github.com/JuliaGPU/GPUCompiler.jl/blob/6db8bdb1f8377aab54af0f7ffde49102278d730e/src/validation.jl#L183) and the second being a [`jl_apply_generic`]( https://github.com/JuliaGPU/GPUCompiler.jl/blob/6db8bdb1f8377aab54af0f7ffde49102278d730e/src/validation.jl#L207). The former we could give more information,...
Needs to update the Project.toml?