Anton Smirnov

Results 213 comments of Anton Smirnov

Mixing default and non-default streams in `hip*Async` functions seems to cause hangs. Here's C++ reproducer: ```cpp #include #include void fn() { hipStream_t stream; hipStreamCreate(&stream); int n_elements = 1024 * 1024;...

Kind ping, to see if someone can take a look at the issue.

@torrance thanks for the update! This should significantly help with CI in AMDGPU.jl

Here's C++ reproducer: ```cpp #include #include using namespace std; void check(int res) { if (res != 0) { std::cerr

In this MWE it is important to have different priorities, but that might not be the only reproducer. If I don't run tests that test priorities, then it also hangs,...

Do tell if you need any other info. GPU: RX7900XT (gfx1100) ROCm: 5.7.1 (from amdgpu-install script) OS: Ubuntu 22.04 Kernel: 6.2.0-37-generic Here's full log from C++ MWE: [log.txt](https://github.com/ROCm-Developer-Tools/HIP/files/13448536/log.txt)

Actually, after rebooting machine both Julia and C++ MWE are not reproducible. But once you run AMDGPU.jl tests they hang in `hipStreamDestroy` and after that all MWEs are reproducible again.

`dmesg` output is full of page faults after running tests: ``` [ 2367.046840] amdgpu 0000:2f:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process pid 0 thread pid 0)...

Rebuilding HIP in debug mode, we get a more detailed trace: ``` (gdb) bt #0 0x00007f9791508c9b in sched_yield () at ../sysdeps/unix/syscall-template.S:120 #1 0x00007f96af332cd1 in amd::Os::yield () at /home/pxl-th/code/clr/rocclr/os/os_posix.cpp:418 #2 0x00007f96af33cad0...

> `dmesg` output is full of page faults after running tests: > > ``` > [ 2367.046840] amdgpu 0000:2f:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:0 pasid:0, for process pid...