KernelAbstractions.jl @print functionality inconsistent with CUDADevice()

This is more for my own reference later.

If I create the following kernel:

@kernel function check_kernel()

    cI = @index(Global)

    if cI == 2
        @print("cI is: ", cI, '\n')
    end
end

function check(use_gpu)
    if use_gpu
        kernel! = check_kernel(CUDADevice(), 256)
    else
        kernel! = check_kernel(CPU(),8)
    end
    kernel!(ndrange=256) 
end

I find odd behavior in the REPL:

julia> include("src/check.jl")
cI is: 2KernelAbstractions
.CudaEvent(CuEvent(Ptr{Nothing} @0x000055d1cee5f4c0, CuContext(Ptr{Nothing} @0x000055d1ce966110)))

julia> wait(check(true))

julia> wait(check(false))
cI is: 2

julia> exit()
cI is: 2

For some reason, the printing for kernels run on a CUDADevice() will only print on exiting the REPL

I guess there is some way to make the wait() function also prompt the @print macro, but it might take some digging.

Jul 10 '20 15:07 leios

This buffer is flushed only for
the start of a kernel launch
synchronization (e.g. cudaDeviceSynchronize())
blocking memory copies (e.g. cudaMemcpy(...))
module load/unload
context destruction
An important thing to note is that this list does not include program exit. If the call to cudaDeviceSynchronize() was removed from the example program above, the we would see no output.

http://15418.courses.cs.cmu.edu/spring2013/article/15

Jul 10 '20 15:07 vchuravy

Ok, bear with me.

I added @synchronize() to the kernel and it didn't seem to do anything different. I then updated the @print macros cuda backend to also have a synchronize.

If I run the check() function n times, I get n outputs on julia exit, which leads me to believe this is more of a julia thing than a cuda thing.

Could it be possible for the print macro to send the output to another printing thread that @show doesn't see?

Jul 10 '20 15:07 leios

It is definitely a cuda thing. The only thing we do is to call vnprintf on the device, the output is then managed by CUDA. The synchronize mentioned on the website I found is a CUDA.synchronize() which performs a device level synchronization, not a warp level synchronization like @synchronize. The reason why you get the output on the exit from Julia is probably because we intentionally deconstruct the context.

Jul 10 '20 16:07 vchuravy

Ok, but it looks like wait already called that synchronize.

From the cuda backend:

function wait(::CPU, ev::CudaEvent, progress=yield)
    if progress === nothing
        CUDA.synchronize(ev.event)
    else
        while !isdone(ev)
            progress()
        end
    end
end

I was trying CUDA.sync_threads() in the overdubbed print macro, which I thought would do the same thing?

Jul 10 '20 16:07 leios

I was trying CUDA.sync_threads() in the overdubbed print macro, which I thought would do the same thing?

No that is still a within kernel synchronize.

CUDA.synchronize(ev.event) is an event synchronize whereas I think you need a https://github.com/JuliaGPU/CUDA.jl/blob/4ea9cabea7ab7da58b7bf73d946b60a604e3b445/lib/cudadrv/context.jl#L151 CUDA.synchronize().

Jul 10 '20 16:07 vchuravy

Ah, tried that one too, but got a spectacular error message. I'll poke around more after* an upcoming meeting

after = during.

Jul 10 '20 16:07 leios

KernelAbstractions.jl KernelAbstractions.jl copied to clipboard

@print functionality inconsistent with CUDADevice()

KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard