HIP
HIP copied to clipboard
Bug with hipcc -O0 on gfx1035 and gfx906
Hi there,
I understand that the gfx1035 architecture on an integrated Radeon 680M is not yet officially supported by AMD. However if it is of interest to future support then there is a runtime bug with HIP when compiling with hipcc and optimisation level -O0.
This matrix multiplication code works as expected with normal compilation.
hipcc mat_mult_bugreport.cpp -o a.out
./a.out
Maximum error (infinity norm) is: 5.72205e-06
When compiling with hipcc and optimisation level -O0 it does not produce the expected result.
hipcc -O0 mat_mult_bugreport.cpp -o a.out
./a.out
Maximum error (infinity norm) is: 3.39192e+38
Please find attached the code for this below.
HIP does not support gfx1035 officially. That being said, the ISA should be similar to gfx1031 on which this runs fine.
Can you share the ROCm version and HIP version (hipcc --version), I can try to look at it.
Hi Jatin,
Sure, here is the requested info for the rocm version I am using.
HIP version: 5.4.22802-aaa1e3d8 AMD clang version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.4.1 22465 d6f0fe8b22e3d8ce0f2cbd657ea14b16043018a5) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/rocm-5.4.1/llvm/bin
I did not see the error on gfx90a with rocm 5.0.2, just on my laptop with gfx1035 and rocm 5.4.1 (the version above). The infinity norm should be in the range 10^(-6) - 10^(-5).
can you compile the code with -save-temps and attach the dumped *-gfx1035.bc and *-gfx1035.s files? thanks.
Sure here are the temp files for when compilation with -O0 produces an error.
hipcc -O0 -save-temps mat_mult_bugreport.cpp -o a.out
The files for buggy compilation are here
Here are the temp files for normal compilation where the code works fine.
hipcc -save-temps mat_mult_bugreport.cpp -o a.out
I saw the test passing on gfx1031 with llvm commit hash 51371ceeab92. The ISA for gfx1031 is almost the same as gfx1035, therefore it should pass on gfx1035 too. The most significant difference between the ISA that is passing and failing is saving and restore exec when restoring spilled sreg from vreg, which is likely due to https://reviews.llvm.org/D124196. This makes sense since a bug about sreg spilling could cause incorrect results.
Hopefully, the fix will be available in the next ROCm release.
Hi folks,
Just an update to this. I am still seeing the problem with ROCM 5.4.3. I now have access to an officially supported GFX906 architecture and I am seeing the same issue on GFX906 with ROCM 5.4.3. I also continue to see the problem with ROCM 5.4.3 on GFX1035.
hipcc mat_mult_bugreport.cpp -o a.out
./a.out
I get the expected result and the error is within machine precision.
Maximum error (infinity norm) is: 5.72205e-06
Now try with option -O0 and I get an erroneous result on gfx906.
hipcc -O0 mat_mult_bugreport.cpp -o a.out
./a.out
Maximum error (infinity norm) is: 24.3876
On gfx1035 with ROCM 5.4.3 I still get the erroneus result
./a.out
Maximum error (infinity norm) is: 3.25297e+38
I am experiencing a similar problem.
I am currently experimenting with rocm 5.4.2.50402-104~22.04 on Debian with a custom kernel 6.2.10 on a laptop with an AMD Rembrandt 6800u. (Initially, I was compiling for the wrong offload architecture, gfx1030 instead of gfx1035, unknowingly. Since then I was always compiling with -O0 and -g to understand the problem. But, then I ran into the following problem:
:3:hip_module.cpp :469 : 15259417612 us: 25435: [tid:0x7fb6a9a2c880] hipLaunchKernel ( 0x201050, {1,1,1}, {4,4,1}, 0x7ffdf8c44b40, 0, stream:<null> )
:3:devprogram.cpp :2676: 15259417754 us: 25435: [tid:0x7fb6a9a2c880] Using Code Object V4.
:3:devprogram.cpp :2979: 15259417988 us: 25435: [tid:0x7fb6a9a2c880] For Init/Fini: Kernel Name: _Z23matrix_transpose_kernelPfPKfj
:4:command.cpp :349 : 15259417999 us: 25435: [tid:0x7fb6a9a2c880] Command (KernelExecution) enqueued: 0x25c0100
:3:rocvirtual.cpp :703 : 15259418003 us: 25435: [tid:0x7fb6a9a2c880] Arg0: = ptr:0x7fb5a5e01000 obj:[0x7fb5a5e01000-0x7fb5a5e01040]
:3:rocvirtual.cpp :703 : 15259418004 us: 25435: [tid:0x7fb6a9a2c880] Arg1: = ptr:0x7fb5a5e00000 obj:[0x7fb5a5e00000-0x7fb5a5e00040]
:3:rocvirtual.cpp :778 : 15259418006 us: 25435: [tid:0x7fb6a9a2c880] Arg2: = val:4
:3:rocvirtual.cpp :2774: 15259418007 us: 25435: [tid:0x7fb6a9a2c880] ShaderName : _Z23matrix_transpose_kernelPfPKfj
:1:rocvirtual.cpp :2822: 15259418009 us: 25435: [tid:0x7fb6a9a2c880] Pcie atomics not enabled, hostcall not supported
:1:rocvirtual.cpp :3137: 15259418010 us: 25435: [tid:0x7fb6a9a2c880] AQL dispatch failed!
:4:command.cpp :179 : 15259418011 us: 25435: [tid:0x7fb6a9a2c880] Command 0x25c0100 complete
:3:hip_module.cpp :470 : 15259418013 us: 25435: [tid:0x7fb6a9a2c880] hipLaunchKernel: Returned hipSuccess :
:3:hip_error.cpp :27 : 15259418016 us: 25435: [tid:0x7fb6a9a2c880] hipGetLastError ( )
...
Validating transposed matrix.
Validation failed with 16 errors.
Finally I found this discussion, Otherwise I presumably would not have tried to switch to O1. When compiling warp_shuffle/main.hip with O1, the problem disappears:
:3:hip_module.cpp :469 : 15220897062 us: 25374: [tid:0x7fbdaf05d880] hipLaunchKernel ( 0x2010a0, {1,1,1}, {4,4,1}, 0x7ffe6421a5d0, 0, stream:<null> )
:3:devprogram.cpp :2676: 15220897181 us: 25374: [tid:0x7fbdaf05d880] Using Code Object V4.
:3:devprogram.cpp :2979: 15220897374 us: 25374: [tid:0x7fbdaf05d880] For Init/Fini: Kernel Name: _Z23matrix_transpose_kernelPfPKfj
:4:command.cpp :349 : 15220897384 us: 25374: [tid:0x7fbdaf05d880] Command (KernelExecution) enqueued: 0xef9460
:3:rocvirtual.cpp :703 : 15220897389 us: 25374: [tid:0x7fbdaf05d880] Arg0: = ptr:0x7fbca7201000 obj:[0x7fbca7201000-0x7fbca7201040]
:3:rocvirtual.cpp :703 : 15220897390 us: 25374: [tid:0x7fbdaf05d880] Arg1: = ptr:0x7fbca7200000 obj:[0x7fbca7200000-0x7fbca7200040]
:3:rocvirtual.cpp :778 : 15220897391 us: 25374: [tid:0x7fbdaf05d880] Arg2: = val:4
:3:rocvirtual.cpp :2774: 15220897393 us: 25374: [tid:0x7fbdaf05d880] ShaderName : _Z23matrix_transpose_kernelPfPKfj
:4:rocvirtual.cpp :862 : 15220897397 us: 25374: [tid:0x7fbdaf05d880] HWq=0x7fbdaf1d0000, Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=3, grid=[4, 4, 1], workgroup=[4, 4, 1], private_seg_size=0, group_seg_size=0, kernel_obj=0x7fbdaefca540, kernarg_address=0x7fbca6200000, completion_signal=0x0
:3:hip_error.cpp :27 : 15220897403 us: 25374: [tid:0x7fbdaf05d880] hipGetLastError ( )
....
Validating transposed matrix.
Validation passed.
@drtpotter Can you please test with latest ROCm 6.0.2 (HIP 6.0.32831) to see if issue still occurs? Thanks!
Hi! On GFX1035 I have given up trying to use the amdgpu-dkms driver because it messes around with my laptop display. This behaviour did not occur with ROCM 5.7. On GFX1035 I now just use HIP with the driver in the Linux kernel and that seems to work fine. With the open source kernel driver in kernel 6.5.0-25, GFX1035, and ROCM 6.0.2 I don't see the problem.
On GFX906 with ROCM 6.0.2 and the amdgpu-dkms driver the problem is still present.
hipcc -O0 mat_mult_bugreport.cpp -o a.out
./a.out
Maximum error (infinity norm) is: 24.3876
hipcc -O1 mat_mult_bugreport.cpp -o a.out
./a.out
Maximum error (infinity norm) is: 5.72205e-06
Feel free to incorporate this tool into any CI processes you have going!
Hi, I am also getting this error on my Radeon RX 7800 XT with rocm 6.1.2
proof of concept, opti.cu
:
#include <hip/hip_runtime.h>
inline __device__ void do_nothing(double *arr) { return; }
__global__ void kernel()
{
double arr[2045];
do_nothing(arr);
}
int main()
{
kernel<<<22, 33>>>();
hipError_t _ = hipDeviceSynchronize();
return 0;
}
when I compile with -OO I get a segfault, but not -O1
> hipcc -O0 --offload-arch=gfx1101 opti.cu -o opti && ./opti
Memory access fault by GPU node-2 (Agent handle: 0x1f48310) on address 0x7f9b9fe00000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
> hipcc -O1 --offload-arch=gfx1101 opti.cu -o opti && ./opti
# no segfault
hipcc --version
HIP version: 6.1.40093-bd86f1708
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-6.1.2 24193 669db884972e769450470020c06a6f132a8a065b)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.1.2/llvm/bin
Configuration file: /opt/rocm-6.1.2/lib/llvm/bin/clang++.cfg
That sounds like a compiler bug. Will check out the output from the clang version you provided. Will update this when I have something.
The team can not seem to reproduce the issue internally.
I can reproduce the optimisation dependent error with
HIP version: 5.7.31921-1949b1621
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.7.3 23382 f3e174a1d286158c06e4cc8276366b1d4bc0c914)
Target: x86_64-unknown-linux-gnu
Thread model: posix
and the compilation flags of
for ((i=0;i<4;i++))
do
hipcc -O${i} --offload-arch=gfx90a opti.cu -o opti-${i}
./opt-${i} 1> log.${i}.txt 2> errors.${i}.txt
done
But the oddness is that it is dependent on the array declaration in the code, specifically for smaller sizes of an array there is no issue.
__global__ void kernel()
{
double arr[2045]; //sizes smaller that 2045 seem to work just fine.
do_nothing(arr);
}
Do you have ideas as to what the issue might be?
Can you perhaps provide the symbols from your builds (the output from nm
)? It might be useful.
Cheers, Pascal
Hi @drtpotter, I have used the mat_mult_bugreport.cpp.txt you provided on a Radeon Pro VII (gfx906) but was unable to reproduce the issue. Can you upgrade to ROCm 6.2 and check if the issue is still present?
Sorry @sohaibnd the issue still remains on gfx906.
hipconfig
HIP version: 6.2.41133-dd7f95766
==hipconfig
HIP_PATH :/opt/rocm-6.2.0
ROCM_PATH :/opt/rocm-6.2.0
HIP_COMPILER :clang
HIP_PLATFORM :amd
HIP_RUNTIME :rocclr
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-6.2.0/include -I/include
==hip-clang
HIP_CLANG_PATH :/opt/rocm-6.2.0/lib/llvm/bin
AMD clang version 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.2.0 24292 26466ce804ac523b398608f17388eb6d605a3f09)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.2.0/lib/llvm/bin
Configuration file: /opt/rocm-6.2.0/lib/llvm/bin/clang++.cfg
AMD LLVM version 18.0.0git
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: tigerlake
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :
-O3
hip-clang-ldflags :
--driver-mode=g++ -O3 --hip-link
== Environment Variables
PATH =/opt/rocm-6.2.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH=/opt/rocm-6.2.0/lib/llvm/lib:/opt/rocm-6.2.0/lib
== Linux Kernel
Hostname :
workshop
Linux workshop 6.5.0-25-generic #25~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Feb 20 16:09:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
dkms status
amdgpu/6.3.6-1718217.22.04, 6.5.0-25-generic, x86_64: installed
hipcc -O0 mat_mult_bugreport.cpp; ./a.out
Maximum error (infinity norm) is: 24.3876
This is wrong. Then if I try
hipcc -O1 mat_mult_bugreport.cpp; ./a.out
Maximum error (infinity norm) is: 5.72205e-06
This is correct.
Is the issue still present on the gfx1035 card as well?
Hey @sohaibnd, on gfx1035 (without amdgpu-dkms because it somehow messes with my screen?) I don't see the error,
hipcc -O0 mat_mult_bugreport.cpp; ./a.out
Maximum error (infinity norm) is: 5.72205e-06
hipcc -O1 mat_mult_bugreport.cpp; ./a.out
Maximum error (infinity norm) is: 5.72205e-06
which is good news! I still see the error on gfx906 though.
Got it, thanks for your help in investigating this issue. While it is possible this may be a compiler bug specific to gfx906, support for gfx906 is now deprecated. I suggest compiling with -O1/-O2/-O3 as a workaround if you still want to use the gfx906 card.
I'm going to close this ticket but if you come across this issue on one of the supported GPUs, feel free to re-open it and we can look into it further.