AMDGPU.jl
AMDGPU.jl copied to clipboard
Guidance on randn hanging on MI250X
The question is three-fold:
- What's the best way to generate random numbers (not arrays) on device kernel code.
- When trying
AMDGPU.randn(eltype(u), 1)orrandn(eltype(u), 1)(from Base), AMDGPU.jl hangs with those on the Crusher MI250X usingrocm/5.4.0(no error, job just dies until times out after 3 min/4 min). Without those it runs fine, but the problem requires random numbers on device.
Any guidance is appreciated and thanks for the great job.
Can you split the issue into device side RNG and the host side hang? They are unrelated and likely need different work to fix
@vchuravy Will do, I changed the description and will open a separate issue. Thanks!
@williamfgc what stacktrace do you get if you Ctrl-C a hanging randn call?