KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

`wait` some times fails

Open GiggleLiu opened this issue 5 years ago • 5 comments

When you call wait on both CPU and GPU. The synchronization can be unstable. The error message is

any: Test Failed at /home/leo/jcode/lab/wait_fail.jl:34
  Expression: Array(c1) ≈ ones(100)
   Evaluated: Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ≈ [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Stacktrace:
 [1] top-level scope at /home/leo/jcode/lab/wait_fail.jl:34
 [2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
 [3] top-level scope at /home/leo/jcode/lab/wait_fail.jl:31
Test Summary: | Pass  Fail  Total
any           |    1     1      2

MWE: you need to try the test for about 10 times, or more

using CuArrays, KernelAbstractions
using Test

@kernel function ba_kernel(x)
    i = @index(Global)
    x[i] += 1
end

function f1()
    nthread=4
    a = zeros(100)
    event1 = ba_kernel(CPU(), nthread)(a; ndrange=100)
    wait(event1)
    a
end

function f2()
    blockdim=256
    x = CuArrays.zeros(100)
    event1 = ba_kernel(CUDA(), blockdim)(x; ndrange=100)
    wait(event1)
    x
end

@testset "fail sometime" begin
    r1 = f1()
    c1 = f2()
    @test r1 ≈ ones(100)
    @test Array(c1) ≈ ones(100)
end

@testset "always work" begin
    c1 = f2()
    @test Array(c1) ≈ ones(100)
end

GiggleLiu avatar Apr 27 '20 04:04 GiggleLiu

This is very peculiar. Haven't had time to investigate what is happening here.

vchuravy avatar Apr 28 '20 20:04 vchuravy

I have been unable to reproduce this. Can you add Some prints around here? https://github.com/JuliaGPU/KernelAbstractions.jl/blob/dd93c7abed46b53d89e8368ca747af8e616c5489/src/backends/cpu.jl#L63 We are just calling the Base.wait

Also which version of Julia are you using?

vchuravy avatar May 04 '20 15:05 vchuravy

@vchuravy I can't reproduce it in my REPL, but in atom

print!
Test Summary: | Pass  Total
fail sometime |    2      2
print!
Test Summary: | Pass  Total
fail sometime |    2      2
print!
Test Summary: | Pass  Total
fail sometime |    2      2
print!
Test Summary: | Pass  Total
fail sometime |    2      2
print!
Test Summary: | Pass  Total
fail sometime |    2      2
print!
Test Summary: | Pass  Total
fail sometime |    2      2
print!
Test Failed at /home/leo/.julia/dev/KernelAbstractions/waiterror.jl:29
  Expression: Array(c1) ≈ ones(100)
   Evaluated: Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ≈ [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0  …  1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Stacktrace:
 [1] top-level scope at /home/leo/.julia/dev/KernelAbstractions/waiterror.jl:29
 [2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
 [3] top-level scope at /home/leo/.julia/dev/KernelAbstractions/waiterror.jl:26
Test Summary: | Pass  Fail  Total
fail sometime |    1     1      2
ERROR: Some tests did not pass: 1 passed, 1 failed, 0 errored, 0 broken.

So this might be an Atom related issue.

GiggleLiu avatar May 04 '20 20:05 GiggleLiu

What is Base.Threads.nthreads()?

vchuravy avatar May 05 '20 23:05 vchuravy

Both Atom and REPL returns 1. @vchuravy

GiggleLiu avatar May 06 '20 21:05 GiggleLiu