KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
Enzyme autodiff produces out-of-bounds error for some kernels.
Running this code using the current versions of Enzyme, KA, and CUDA.jl:
using KernelAbstractions
using CUDA
using Enzyme
function advanceTimeLevels!(field; backend=CUDABackend())
nthreads = 64
kernel2d! = advance_2d_array(backend, nthreads)
kernel2d!(field, ndrange=size(field)[1])
end
@kernel function advance_2d_array(field)
j = @index(Global, Linear)
if j < 101
@inbounds field[j,1] = field[j,2]
end
@synchronize()
end
field = CUDA.CuArray(ones(100, 2))
d_field = Enzyme.make_zero(field)
autodiff(Enzyme.Reverse, advanceTimeLevels!, Duplicated(field, d_field))
@show field
@show d_field
produces this error (can add more of the stacktrace if needed):
ERROR: a BoundsError was thrown during kernel execution on thread (37, 1, 1) in block (2, 1, 1).
Out-of-bounds array access
Since there are 64 threads per block, the 37th entry of block 2 corresponds to global index 101 which is out-of-bounds for the array field. But the kernel has a conditional statement to avoid accessing the array at any entry greater than 100 (its length). If we run the function advanceTimeLevels! without autodiff, no error occurs. If we run autodiff with a block size that divides the array length, such as nthreads = 100 or nthreads = 50, no error as well.
@wsmoses @michel2323