KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
@print functionality inconsistent with CUDADevice()
This is more for my own reference later.
If I create the following kernel:
@kernel function check_kernel()
cI = @index(Global)
if cI == 2
@print("cI is: ", cI, '\n')
end
end
function check(use_gpu)
if use_gpu
kernel! = check_kernel(CUDADevice(), 256)
else
kernel! = check_kernel(CPU(),8)
end
kernel!(ndrange=256)
end
I find odd behavior in the REPL:
julia> include("src/check.jl")
cI is: 2KernelAbstractions
.CudaEvent(CuEvent(Ptr{Nothing} @0x000055d1cee5f4c0, CuContext(Ptr{Nothing} @0x000055d1ce966110)))
julia> wait(check(true))
julia> wait(check(false))
cI is: 2
julia> exit()
cI is: 2
For some reason, the printing for kernels run on a CUDADevice() will only print on exiting the REPL
I guess there is some way to make the wait() function also prompt the @print macro, but it might take some digging.
This buffer is flushed only for
the start of a kernel launch synchronization (e.g. cudaDeviceSynchronize()) blocking memory copies (e.g. cudaMemcpy(...)) module load/unload context destructionAn important thing to note is that this list does not include program exit. If the call to cudaDeviceSynchronize() was removed from the example program above, the we would see no output.
http://15418.courses.cs.cmu.edu/spring2013/article/15
Ok, bear with me.
I added @synchronize() to the kernel and it didn't seem to do anything different. I then updated the @print macros cuda backend to also have a synchronize.
If I run the check() function n times, I get n outputs on julia exit, which leads me to believe this is more of a julia thing than a cuda thing.
Could it be possible for the print macro to send the output to another printing thread that @show doesn't see?
It is definitely a cuda thing. The only thing we do is to call vnprintf on the device, the output is then managed by CUDA. The synchronize mentioned on the website I found is a CUDA.synchronize() which performs a device level synchronization, not a warp level synchronization like @synchronize. The reason why you get the output on the exit from Julia is probably because we intentionally deconstruct the context.
Ok, but it looks like wait already called that synchronize.
From the cuda backend:
function wait(::CPU, ev::CudaEvent, progress=yield)
if progress === nothing
CUDA.synchronize(ev.event)
else
while !isdone(ev)
progress()
end
end
end
I was trying CUDA.sync_threads() in the overdubbed print macro, which I thought would do the same thing?
I was trying CUDA.sync_threads() in the overdubbed print macro, which I thought would do the same thing?
No that is still a within kernel synchronize.
CUDA.synchronize(ev.event) is an event synchronize whereas I think you need a https://github.com/JuliaGPU/CUDA.jl/blob/4ea9cabea7ab7da58b7bf73d946b60a604e3b445/lib/cudadrv/context.jl#L151 CUDA.synchronize().
Ah, tried that one too, but got a spectacular error message. I'll poke around more after* an upcoming meeting
- after = during.