KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
`wait` some times fails
When you call wait on both CPU and GPU. The synchronization can be unstable. The error message is
any: Test Failed at /home/leo/jcode/lab/wait_fail.jl:34
Expression: Array(c1) ≈ ones(100)
Evaluated: Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ≈ [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 … 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Stacktrace:
[1] top-level scope at /home/leo/jcode/lab/wait_fail.jl:34
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[3] top-level scope at /home/leo/jcode/lab/wait_fail.jl:31
Test Summary: | Pass Fail Total
any | 1 1 2
MWE: you need to try the test for about 10 times, or more
using CuArrays, KernelAbstractions
using Test
@kernel function ba_kernel(x)
i = @index(Global)
x[i] += 1
end
function f1()
nthread=4
a = zeros(100)
event1 = ba_kernel(CPU(), nthread)(a; ndrange=100)
wait(event1)
a
end
function f2()
blockdim=256
x = CuArrays.zeros(100)
event1 = ba_kernel(CUDA(), blockdim)(x; ndrange=100)
wait(event1)
x
end
@testset "fail sometime" begin
r1 = f1()
c1 = f2()
@test r1 ≈ ones(100)
@test Array(c1) ≈ ones(100)
end
@testset "always work" begin
c1 = f2()
@test Array(c1) ≈ ones(100)
end
This is very peculiar. Haven't had time to investigate what is happening here.
I have been unable to reproduce this. Can you add Some prints around here? https://github.com/JuliaGPU/KernelAbstractions.jl/blob/dd93c7abed46b53d89e8368ca747af8e616c5489/src/backends/cpu.jl#L63 We are just calling the Base.wait
Also which version of Julia are you using?
@vchuravy I can't reproduce it in my REPL, but in atom
print!
Test Summary: | Pass Total
fail sometime | 2 2
print!
Test Summary: | Pass Total
fail sometime | 2 2
print!
Test Summary: | Pass Total
fail sometime | 2 2
print!
Test Summary: | Pass Total
fail sometime | 2 2
print!
Test Summary: | Pass Total
fail sometime | 2 2
print!
Test Summary: | Pass Total
fail sometime | 2 2
print!
Test Failed at /home/leo/.julia/dev/KernelAbstractions/waiterror.jl:29
Expression: Array(c1) ≈ ones(100)
Evaluated: Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ≈ [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 … 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Stacktrace:
[1] top-level scope at /home/leo/.julia/dev/KernelAbstractions/waiterror.jl:29
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[3] top-level scope at /home/leo/.julia/dev/KernelAbstractions/waiterror.jl:26
Test Summary: | Pass Fail Total
fail sometime | 1 1 2
ERROR: Some tests did not pass: 1 passed, 1 failed, 0 errored, 0 broken.
So this might be an Atom related issue.
What is Base.Threads.nthreads()?
Both Atom and REPL returns 1. @vchuravy