AMDGPU.jl
AMDGPU.jl copied to clipboard
HSA memory test hang the GPU in CI
Testing the AMDGPU.Mem.unsafe_copy3d! function (#220) may hang the GPU in the BuildKite CI. No issue is observed outside of CI.
A current workaround is to add an operation (tested sleep, println or now assigning the signal to sig) before the call to amd_memory_async_copy_rect
https://github.com/JuliaGPU/AMDGPU.jl/blob/cfaade146977594bf18e14b285ee3a9c84fbc7f2/src/memory.jl#L394-L397
A potential cause may be some instability wrt HSASignal. This may potentially relate to #208 as well.