KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

Enzyme autodiff produces out-of-bounds error for some kernels.

Open jlk9 opened this issue 1 year ago • 0 comments

Running this code using the current versions of Enzyme, KA, and CUDA.jl:

using KernelAbstractions
using CUDA

using Enzyme

function advanceTimeLevels!(field; backend=CUDABackend())

    nthreads = 64

    kernel2d! = advance_2d_array(backend, nthreads)
    
    kernel2d!(field, ndrange=size(field)[1])
end

@kernel function advance_2d_array(field)
    j = @index(Global, Linear)
    if j < 101
        @inbounds field[j,1] = field[j,2]
    end
    @synchronize()
end

field = CUDA.CuArray(ones(100, 2))

d_field = Enzyme.make_zero(field)

autodiff(Enzyme.Reverse, advanceTimeLevels!, Duplicated(field, d_field))

@show field
@show d_field

produces this error (can add more of the stacktrace if needed):

ERROR: a BoundsError was thrown during kernel execution on thread (37, 1, 1) in block (2, 1, 1).
Out-of-bounds array access

Since there are 64 threads per block, the 37th entry of block 2 corresponds to global index 101 which is out-of-bounds for the array field. But the kernel has a conditional statement to avoid accessing the array at any entry greater than 100 (its length). If we run the function advanceTimeLevels! without autodiff, no error occurs. If we run autodiff with a block size that divides the array length, such as nthreads = 100 or nthreads = 50, no error as well.

@wsmoses @michel2323

jlk9 avatar Jul 10 '24 18:07 jlk9