hipamd
hipamd copied to clipboard
ROCm 5.3 gfx1030 hang with hipStreamCreate and hipStreamDestroy
The following test hangs with ROCm 5.3 on the gfx1030
architecture (AMD Radeon PRO V620).
#include <hip/hip_runtime.h>
#include <cstdio>
int main()
{
printf("starting..\n");
hipStream_t stream;
hipStreamCreate(&stream);
hipStreamDestroy(stream);
hipStream_t stream2;
hipStreamCreateWithFlags(&stream2, hipStreamNonBlocking);
hipStreamDestroy(stream2);
printf("finished!\n");
}
Ran with
hipcc test.cpp
./a.out
Built and executed in Docker image rocm/rocm-terminal
. hipconfig
reports
HIP version : 5.3.22061-e8e78f1a
== hipconfig
HIP_PATH : /opt/rocm-5.3.0
ROCM_PATH : /opt/rocm-5.3.0
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME : rocclr
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.3.0/include -I/opt/rocm-5.3.0/llvm/bin/../lib/clang/15.0.0 -I/opt/rocm-5.3.0/hsa/include
== hip-clang
HSA_PATH : /opt/rocm-5.3.0/hsa
HIP_CLANG_PATH : /opt/rocm-5.3.0/llvm/bin
AMD clang version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.3.0 22362 3cf23f77f8208174a2ee7c616f4be23674d7b081)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.3.0/llvm/bin
AMD LLVM version 15.0.0git
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: znver3
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
-std=c++11 -isystem "/opt/rocm-5.3.0/llvm/lib/clang/15.0.0/include/.." -isystem /opt/rocm-5.3.0/hsa/include -isystem "/opt/rocm-5.3.0/include" -O3
-L"/opt/rocm-5.3.0/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt
=== Environment Variables
PATH=/home/rocm-user/.vscode-server/bin/129500ee4c8ab7263461ffe327268ba56b9f210d/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/rocm/bin
== Linux Kernel
Hostname : fb5ed677a12b
Linux fb5ed677a12b 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
The test works with ROCm 5.2 and with ROCm 5.3 on other architectures.
The issue is also reproducible with
#include <hip/hip_runtime.h>
#include <cstdio>
int main()
{
printf("starting..\n");
hipStream_t stream;
hipStreamCreate(&stream);
hipStreamDestroy(stream);
hipStream_t stream2;
hipStreamCreate(&stream2);
hipStreamDestroy(stream2);
printf("finished!\n");
}
so it is not specific to hipStreamCreateWithFlags
.
The issue is reproducible on a system with two gfx1030
cards. It is not reproducible on a system with only one: if I create a rocm/rocm-terminal:5.3
image and pass only one card, the example works like it should. The issue is not reproducible on a system with two gfx908
cards.
I was able to reproduce the hang in a 5.3 docker container (rocm/rocm-terminal:5.3
) before updating the host system to 5.3, but not after it.
Looks like an additional requirement for this to trigger is to have the rocm 5.2 kernel module, but using the 5.3 runtime (typically via new docker containers, when the host has not yet been updated).