Valentin Churavy
Valentin Churavy
We don't want people to use string interpolation in the kernel ala `@print("ii = $ii; ij = $ij; ik = $ik; bi = $bi; groupsize() = $(groupsize())\n")`
So the idea is that we have a common output format for the CPU backend as well as the GPU backend. This implements NVTXT so that you can load these...
@dpsanders ``` using KernelAbstractions using OffsetArrays @kernel function update!(A, @Const(B)) i, j = @index(Global, NTuple) acc = zero(eltype(A)) for m in -1:1 for n in -1:1 acc += B[i+m, j+n]...
In the example below the stacktrace is `src/macros.jl:212` but it should have rather been the kernel function itself. In general due to the code motion we are doing the line...
still fails with illegal memory access in kernel
x-ref: https://github.com/vchuravy/GPUifyLoops.jl/issues/91
https://github.com/JuliaGPU/KernelAbstractions.jl/blob/1497d4109857c239a3d407943a0f2323c2bfc396/src/backends/cuda.jl#L99 x-ref: https://github.com/vchuravy/GPUifyLoops.jl/issues/100
https://github.com/vchuravy/GPUifyLoops.jl/issues/103
https://github.com/vchuravy/GPUifyLoops.jl/issues/104
Since kernel launches are event based and not stream based, we need a way to efficiently time the kernels themselves, using the event system.