Enzyme + KA Stalls on Error instead of reporting it
When device = CPU() and Julia is started with more than one thread (e.g. -t16), the program stalls.
MWE:
using Enzyme
using KernelAbstractions
using KernelGradients
linear_threads(::CPU) = Threads.nthreads()
Base.zeros(::CPU, ::Type{T}, shape) where T = zeros(T, shape)
Base.ones(::CPU, ::Type{T}, shape) where T = ones(T, shape)
Base.rand(::CPU, ::Type{T}, shape) where T = rand(T, shape)
function ∇spherical_harmonics!(∂L∂x, ∂L∂y, x, y, device)
n = size(x, 2)
∇! = Enzyme.autodiff(
spherical_harmonics_kernel!(device, linear_threads(device)))
wait(∇!(Duplicated(y, ∂L∂y), Duplicated(x, ∂L∂x); ndrange=n))
end
@kernel function spherical_harmonics_kernel!(encodings, @Const(directions))
i = @index(Global)
x = directions[1, i]
y = directions[2, i]
z = directions[3, i]
encodings[1, i] = 0.28209479177387814f0
encodings[2, i] = -0.48860251190291987f0 * y
encodings[3, i] = 0.48860251190291987f0 * z
encodings[4, i] = -0.48860251190291987f0 * x
end
function main()
device = CPU()
n = 1024
x = rand(device, Float32, (3, n))
y = zeros(device, Float32, (4, n))
∂L∂y = ones(device, Float32, (4, n))
∂L∂x = zeros(device, Float32, (3, n))
∇spherical_harmonics!(∂L∂x, ∂L∂y, x, y, device)
end
main()
Details:
- Julia 1.8.0-rc1
]st
[7da242da] Enzyme v0.10.1
[63c18a36] KernelAbstractions v0.8.2
[e5faadeb] KernelGradients v0.1.2
https://github.com/JuliaGPU/KernelAbstractions.jl/issues/298 seems to be related issue.
Can you use redirect the output to a file an post that?
Any you are saying that instead of terminating upon error it's just hanging and waiting?
Ah no it's stalling on the CPU and erroring in the GPU.
@vchuravy here's the output: error.txt
It is for when device = CUDADevice()
Thanks for the CPU part can you try https://docs.julialang.org/en/v1.8.0-rc1/stdlib/Profile/#Triggered-During-Execution
Actually for the CPU I just needed to run Julia with one thread (as opposed to auto): cpu-error.txt
Just as a note see the:
@exception9 = private unnamed_addr constant [25 x i8] c"undefined variable error\00", align 1
In the output? That means you have an undefined variable. Likely directions and encodings.
Yikes! That was indeed the problem :)
@vchuravy thanks! :)
Well we shouldn't stall, but actually error... So something dastardly going on.
I've updated the MWE. Now there is no errors, but when Julia is started with more than one thread on CPU it stalls. If you set to start Julia with only one thread, it completes alright.
CUDADevice is fine.
@pxl-th does this still error for you?
Hm... actually yes, just tried it on 1.10-beta2.
My bad, I forgot that you don't need KernelGradients now, so it installed an old version.
With the updated code it works.