`hipStreamDestroy` hangs
Hi! I wonder what could be the reason why hipStreamDestroy might hang at event waiting:
:3:hip_stream.cpp :464 : 2813836027 us: [pid:29129 tid:0x7fec4bfadd00] hipStreamDestroy ( stream:0x1688950 )
:4:command.cpp :349 : 2813836030 us: [pid:29129 tid:0x7fec4bfadd00] Command (Marker) enqueued: 0x12980d0
:3:rocvirtual.cpp :455 : 2813836036 us: [pid:29129 tid:0x7fec4bfadd00] Set Handler: handle(0x7febc7a86d80), timestamp(0x1249eb0)
:4:rocvirtual.cpp :1011: 2813836040 us: [pid:29129 tid:0x7fec4bfadd00] HWq=0x7fea6c200000, BarrierAND Header = 0x1503 (type=3, barrier=1, acquire=2, release=2), dep_signal=[0x0, 0x0, 0x0, 0x0, 0x0], completion_signal=0x7febc7a86d80
:4:command.cpp :253 : 2813836042 us: [pid:29129 tid:0x7fec4bfadd00] Waiting for event 0x12980d0 to complete, current status 2
This happens all the time with RX7900XT, ROCm 5.6 & 5.7 & Linux 5.19 & 6.2. RX6700XT, for example, does not have these issues at all.
MWE (written in Julia language using AMDGPU.jl) just creates 1 low priority stream, 4 normal streams, 4 high priority streams and then destroys them.
These are direct ccalls so there should be no Julia-specific stuff involved.
Also, trying same on Windows does not reproduce it.
In fact, RX7900XT on Windows works fine and passes all the tests in AMDGPU.jl that we have.
MWE:
using AMDGPU
function main()
sl1 = Ref{AMDGPU.HIP.hipStream_t}()
AMDGPU.HIP.hipStreamCreateWithPriority(sl1, 0, 1) |> AMDGPU.check
s1 = Ref{AMDGPU.HIP.hipStream_t}()
AMDGPU.HIP.hipStreamCreateWithPriority(s1, 0, 0) |> AMDGPU.check
s2 = Ref{AMDGPU.HIP.hipStream_t}()
AMDGPU.HIP.hipStreamCreateWithPriority(s2, 0, 0) |> AMDGPU.check
s3 = Ref{AMDGPU.HIP.hipStream_t}()
AMDGPU.HIP.hipStreamCreateWithPriority(s3, 0, 0) |> AMDGPU.check
s4 = Ref{AMDGPU.HIP.hipStream_t}()
AMDGPU.HIP.hipStreamCreateWithPriority(s4, 0, 0) |> AMDGPU.check
sh1 = Ref{AMDGPU.HIP.hipStream_t}()
AMDGPU.HIP.hipStreamCreateWithPriority(sh1, 0, -1) |> AMDGPU.check
sh2 = Ref{AMDGPU.HIP.hipStream_t}()
AMDGPU.HIP.hipStreamCreateWithPriority(sh2, 0, -1) |> AMDGPU.check
sh3 = Ref{AMDGPU.HIP.hipStream_t}()
AMDGPU.HIP.hipStreamCreateWithPriority(sh3, 0, -1) |> AMDGPU.check
sh4 = Ref{AMDGPU.HIP.hipStream_t}()
AMDGPU.HIP.hipStreamCreateWithPriority(sh4, 0, -1) |> AMDGPU.check
AMDGPU.HIP.hipStreamDestroy(sl1[]) |> AMDGPU.check
AMDGPU.HIP.hipStreamDestroy(s1[]) |> AMDGPU.check
AMDGPU.HIP.hipStreamDestroy(s2[]) |> AMDGPU.check
AMDGPU.HIP.hipStreamDestroy(s3[]) |> AMDGPU.check
AMDGPU.HIP.hipStreamDestroy(s4[]) |> AMDGPU.check
AMDGPU.HIP.hipStreamDestroy(sh1[]) |> AMDGPU.check
AMDGPU.HIP.hipStreamDestroy(sh2[]) |> AMDGPU.check
AMDGPU.HIP.hipStreamDestroy(sh3[]) |> AMDGPU.check
AMDGPU.HIP.hipStreamDestroy(sh4[]) |> AMDGPU.check
end
main()
Here's C++ reproducer:
#include <hip/hip_runtime.h>
#include <iostream>
using namespace std;
void check(int res) {
if (res != 0) {
std::cerr << "Fail" << std::endl;
}
}
int main(int argc, char* argv[]) {
hipStream_t sl1;
check(hipStreamCreateWithPriority(&sl1, 0, 1));
hipStream_t s1;
check(hipStreamCreateWithPriority(&s1, 0, 0));
hipStream_t s2;
check(hipStreamCreateWithPriority(&s2, 0, 0));
hipStream_t s3;
check(hipStreamCreateWithPriority(&s3, 0, 0));
hipStream_t s4;
check(hipStreamCreateWithPriority(&s4, 0, 0));
hipStream_t sh1;
check(hipStreamCreateWithPriority(&sh1, 0, -1));
hipStream_t sh2;
check(hipStreamCreateWithPriority(&sh2, 0, -1));
hipStream_t sh3;
check(hipStreamCreateWithPriority(&sh3, 0, -1));
hipStream_t sh4;
check(hipStreamCreateWithPriority(&sh4, 0, -1));
check(hipStreamDestroy(sl1));
check(hipStreamDestroy(s1));
check(hipStreamDestroy(s2));
check(hipStreamDestroy(s3));
check(hipStreamDestroy(s4));
check(hipStreamDestroy(sh1));
check(hipStreamDestroy(sh2));
check(hipStreamDestroy(sh3));
check(hipStreamDestroy(sh4));
return 0;
}
In this MWE it is important to have different priorities, but that might not be the only reproducer. If I don't run tests that test priorities, then it also hangs, but it is harder to narrow down.
Will work with C++ example, since I do not have a Julia compiler with AMDGPU support installed.
I can not seem to reproduce this. Can you share a bit more information, like ROCm version you are using, GPU name?
Also the log level, can you share the entire log for this sample, might help us trace the command that is getting stuck.
Do tell if you need any other info.
GPU: RX7900XT (gfx1100) ROCm: 5.7.1 (from amdgpu-install script) OS: Ubuntu 22.04 Kernel: 6.2.0-37-generic
Here's full log from C++ MWE: log.txt
Actually, after rebooting machine both Julia and C++ MWE are not reproducible.
But once you run AMDGPU.jl tests they hang in hipStreamDestroy and after that all MWEs are reproducible again.
dmesg output is full of page faults after running tests:
[ 2367.046840] amdgpu 0000:2f:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 2367.046848] amdgpu 0000:2f:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 2367.046851] amdgpu 0000:2f:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B3A
[ 2367.046853] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0000]
[ 2367.046854] amdgpu 0000:2f:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 2367.046858] amdgpu 0000:2f:00.0: amdgpu: MORE_FAULTS: 0x0
[ 2367.046860] amdgpu 0000:2f:00.0: amdgpu: WALKER_ERROR: 0x5
[ 2367.046862] amdgpu 0000:2f:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 2367.046865] amdgpu 0000:2f:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 2367.046867] amdgpu 0000:2f:00.0: amdgpu: RW: 0x0
[ 2367.046914] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0020]
[ 2371.109083] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0020]
[ 2376.497646] amdgpu 0000:2f:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 2376.497653] amdgpu 0000:2f:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 2376.497657] amdgpu 0000:2f:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B3A
[ 2376.497659] amdgpu 0000:2f:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 2376.497662] amdgpu 0000:2f:00.0: amdgpu: MORE_FAULTS: 0x0
[ 2376.497664] amdgpu 0000:2f:00.0: amdgpu: WALKER_ERROR: 0x5
[ 2376.497667] amdgpu 0000:2f:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 2376.497669] amdgpu 0000:2f:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 2376.497671] amdgpu 0000:2f:00.0: amdgpu: RW: 0x0
[ 2376.497833] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0000]
[ 2376.497846] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0020]
[ 2392.170827] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0020]
Rebuilding HIP in debug mode, we get a more detailed trace:
(gdb) bt
#0 0x00007f9791508c9b in sched_yield () at ../sysdeps/unix/syscall-template.S:120
#1 0x00007f96af332cd1 in amd::Os::yield () at /home/pxl-th/code/clr/rocclr/os/os_posix.cpp:418
#2 0x00007f96af33cad0 in amd::Event::awaitCompletion (this=0x28d63f0) at /home/pxl-th/code/clr/rocclr/platform/command.cpp:258
#3 0x00007f96af3443a1 in amd::HostQueue::terminate (this=0x1fba470) at /home/pxl-th/code/clr/rocclr/platform/commandqueue.cpp:67
#4 0x00007f96af36f403 in amd::ReferenceCountedObject::release (this=0x1fba470) at /home/pxl-th/code/clr/rocclr/platform/runtime.cpp:116
#5 0x00007f96af254028 in hip::Stream::Destroy (stream=0x1fba470) at /home/pxl-th/code/clr/hipamd/src/hip_stream.cpp:83
#6 0x00007f96af25bd68 in hipStreamDestroy (stream=0x1fba470) at /home/pxl-th/code/clr/hipamd/src/hip_stream.cpp:496
#7 0x00007f979011a307 in ?? ()
#8 0x0000000001170028 in ?? ()
#9 0x0000000000000000 in ?? ()
dmesgoutput is full of page faults after running tests:[ 2367.046840] amdgpu 0000:2f:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process pid 0 thread pid 0) [ 2367.046848] amdgpu 0000:2f:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 [ 2367.046851] amdgpu 0000:2f:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B3A [ 2367.046853] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0000] [ 2367.046854] amdgpu 0000:2f:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [ 2367.046858] amdgpu 0000:2f:00.0: amdgpu: MORE_FAULTS: 0x0 [ 2367.046860] amdgpu 0000:2f:00.0: amdgpu: WALKER_ERROR: 0x5 [ 2367.046862] amdgpu 0000:2f:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 2367.046865] amdgpu 0000:2f:00.0: amdgpu: MAPPING_ERROR: 0x1 [ 2367.046867] amdgpu 0000:2f:00.0: amdgpu: RW: 0x0 [ 2367.046914] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0020] [ 2371.109083] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0020] [ 2376.497646] amdgpu 0000:2f:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process pid 0 thread pid 0) [ 2376.497653] amdgpu 0000:2f:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 [ 2376.497657] amdgpu 0000:2f:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B3A [ 2376.497659] amdgpu 0000:2f:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [ 2376.497662] amdgpu 0000:2f:00.0: amdgpu: MORE_FAULTS: 0x0 [ 2376.497664] amdgpu 0000:2f:00.0: amdgpu: WALKER_ERROR: 0x5 [ 2376.497667] amdgpu 0000:2f:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 2376.497669] amdgpu 0000:2f:00.0: amdgpu: MAPPING_ERROR: 0x1 [ 2376.497671] amdgpu 0000:2f:00.0: amdgpu: RW: 0x0 [ 2376.497833] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0000] [ 2376.497846] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0020] [ 2392.170827] amdgpu 0000:2f:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xeb1d0000 flags=0x0020]
Just switched back to RX6700XT and there are no page faults at all. And everything works fine.
Only occasional
[ 130.457568] amdgpu: Runlist is getting oversubscribed. Expect reduced ROCm performance.
Steps to reproduce with Julia, just in case:
- Download & unpack Julia 1.10: https://julialang-s3.julialang.org/bin/linux/x64/1.10/julia-1.10.0-rc1-linux-x86_64.tar.gz
- Launch Julia REPL with
<julia-dir>/bin/julia --threads=auto. - Enter package mode with
]key. - Add AMDGPU.jl package:
add AMDGPU#master. - Run AMDGPU tests with
test AMDGPU(tests take usually ~10-15 minutes to complete). - At some point tests will stop printing anything and that means one of the workers hanged.
- Run C++ MWE.
Running AMDGPU.jl tests with HIP debug build gives following:
┌ Error: Worker(pid=77537, terminated=true, termsignal=6) died running test item "kernelabstractions". Recording test error.
└ @ ReTestItems ~/.julia/packages/ReTestItems/HZCMZ/src/ReTestItems.jl:544
Captured logs for test setup "TSGPUArrays" (dependency of "gpuarrays - indexing find") at test/gpuarrays_tests.jl:1 on worker 77565
julia: /home/pxl-th/code/clr/rocclr/os/os_posix.cpp:310: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
[77568] signal (6.-6): Aborted
in expression starting at /home/pxl-th/.julia/dev/AMDGPU/test/gpuarrays_tests.jl:4
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fd05c82871a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
currentStackInfo at /home/pxl-th/code/clr/rocclr/os/os_posix.cpp:310
HostThread at /home/pxl-th/code/clr/rocclr/thread/thread.cpp:34
init at /home/pxl-th/code/clr/rocclr/thread/thread.cpp:170
init at /home/pxl-th/code/clr/rocclr/os/os_posix.cpp:170
init at /home/pxl-th/code/clr/rocclr/os/os_posix.cpp:136
@pxl-th Apologies for the lack of response. Can you please test with latest ROCm 6.1.0 (HIP 6.1)? If resolved, please close ticket. Thanks!
Haven't seen hangs in a while, I think this is resolved! Thanks!