HIP icon indicating copy to clipboard operation
HIP copied to clipboard

Bug with hipcc -O0 on gfx1035 and gfx906

Open drtpotter opened this issue 1 year ago • 14 comments

Hi there,

I understand that the gfx1035 architecture on an integrated Radeon 680M is not yet officially supported by AMD. However if it is of interest to future support then there is a runtime bug with HIP when compiling with hipcc and optimisation level -O0.

This matrix multiplication code works as expected with normal compilation.

hipcc mat_mult_bugreport.cpp -o a.out ./a.out Maximum error (infinity norm) is: 5.72205e-06

When compiling with hipcc and optimisation level -O0 it does not produce the expected result.

hipcc -O0 mat_mult_bugreport.cpp -o a.out ./a.out Maximum error (infinity norm) is: 3.39192e+38

Please find attached the code for this below.

mat_mult_bugreport.cpp.txt

drtpotter avatar Mar 17 '23 04:03 drtpotter

HIP does not support gfx1035 officially. That being said, the ISA should be similar to gfx1031 on which this runs fine.

Can you share the ROCm version and HIP version (hipcc --version), I can try to look at it.

jatinx avatar Mar 17 '23 11:03 jatinx

Hi Jatin,

Sure, here is the requested info for the rocm version I am using.

HIP version: 5.4.22802-aaa1e3d8 AMD clang version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.4.1 22465 d6f0fe8b22e3d8ce0f2cbd657ea14b16043018a5) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/rocm-5.4.1/llvm/bin

I did not see the error on gfx90a with rocm 5.0.2, just on my laptop with gfx1035 and rocm 5.4.1 (the version above). The infinity norm should be in the range 10^(-6) - 10^(-5).

drtpotter avatar Mar 17 '23 11:03 drtpotter

can you compile the code with -save-temps and attach the dumped *-gfx1035.bc and *-gfx1035.s files? thanks.

yxsamliu avatar Mar 17 '23 17:03 yxsamliu

Sure here are the temp files for when compilation with -O0 produces an error.

hipcc -O0 -save-temps mat_mult_bugreport.cpp -o a.out

The files for buggy compilation are here

mat_mult_bugreport-hip-amdgcn-amd-amdhsa-gfx1035.bc.txt

mat_mult_bugreport-hip-amdgcn-amd-amdhsa-gfx1035.s.txt

drtpotter avatar Mar 17 '23 23:03 drtpotter

Here are the temp files for normal compilation where the code works fine.

hipcc -save-temps mat_mult_bugreport.cpp -o a.out

mat_mult_bugreport-hip-amdgcn-amd-amdhsa-gfx1035.bc.txt

mat_mult_bugreport-hip-amdgcn-amd-amdhsa-gfx1035.s.txt

drtpotter avatar Mar 18 '23 00:03 drtpotter

I saw the test passing on gfx1031 with llvm commit hash 51371ceeab92. The ISA for gfx1031 is almost the same as gfx1035, therefore it should pass on gfx1035 too. The most significant difference between the ISA that is passing and failing is saving and restore exec when restoring spilled sreg from vreg, which is likely due to https://reviews.llvm.org/D124196. This makes sense since a bug about sreg spilling could cause incorrect results.

Hopefully, the fix will be available in the next ROCm release.

yxsamliu avatar Mar 18 '23 03:03 yxsamliu

Hi folks,

Just an update to this. I am still seeing the problem with ROCM 5.4.3. I now have access to an officially supported GFX906 architecture and I am seeing the same issue on GFX906 with ROCM 5.4.3. I also continue to see the problem with ROCM 5.4.3 on GFX1035.

hipcc mat_mult_bugreport.cpp -o a.out ./a.out

I get the expected result and the error is within machine precision.

Maximum error (infinity norm) is: 5.72205e-06

Now try with option -O0 and I get an erroneous result on gfx906.

hipcc -O0 mat_mult_bugreport.cpp -o a.out

./a.out Maximum error (infinity norm) is: 24.3876

On gfx1035 with ROCM 5.4.3 I still get the erroneus result ./a.out Maximum error (infinity norm) is: 3.25297e+38

drtpotter avatar Apr 12 '23 23:04 drtpotter

I am experiencing a similar problem.

I am currently experimenting with rocm 5.4.2.50402-104~22.04 on Debian with a custom kernel 6.2.10 on a laptop with an AMD Rembrandt 6800u. (Initially, I was compiling for the wrong offload architecture, gfx1030 instead of gfx1035, unknowingly. Since then I was always compiling with -O0 and -g to understand the problem. But, then I ran into the following problem:

:3:hip_module.cpp           :469 : 15259417612 us: 25435: [tid:0x7fb6a9a2c880]  hipLaunchKernel ( 0x201050, {1,1,1}, {4,4,1}, 0x7ffdf8c44b40, 0, stream:<null> ) 
:3:devprogram.cpp           :2676: 15259417754 us: 25435: [tid:0x7fb6a9a2c880] Using Code Object V4.
:3:devprogram.cpp           :2979: 15259417988 us: 25435: [tid:0x7fb6a9a2c880] For Init/Fini: Kernel Name: _Z23matrix_transpose_kernelPfPKfj
:4:command.cpp              :349 : 15259417999 us: 25435: [tid:0x7fb6a9a2c880] Command (KernelExecution) enqueued: 0x25c0100
:3:rocvirtual.cpp           :703 : 15259418003 us: 25435: [tid:0x7fb6a9a2c880] Arg0:   = ptr:0x7fb5a5e01000 obj:[0x7fb5a5e01000-0x7fb5a5e01040]
:3:rocvirtual.cpp           :703 : 15259418004 us: 25435: [tid:0x7fb6a9a2c880] Arg1:   = ptr:0x7fb5a5e00000 obj:[0x7fb5a5e00000-0x7fb5a5e00040]
:3:rocvirtual.cpp           :778 : 15259418006 us: 25435: [tid:0x7fb6a9a2c880] Arg2:   = val:4
:3:rocvirtual.cpp           :2774: 15259418007 us: 25435: [tid:0x7fb6a9a2c880] ShaderName : _Z23matrix_transpose_kernelPfPKfj
:1:rocvirtual.cpp           :2822: 15259418009 us: 25435: [tid:0x7fb6a9a2c880] Pcie atomics not enabled, hostcall not supported
:1:rocvirtual.cpp           :3137: 15259418010 us: 25435: [tid:0x7fb6a9a2c880] AQL dispatch failed!
:4:command.cpp              :179 : 15259418011 us: 25435: [tid:0x7fb6a9a2c880] Command 0x25c0100 complete
:3:hip_module.cpp           :470 : 15259418013 us: 25435: [tid:0x7fb6a9a2c880] hipLaunchKernel: Returned hipSuccess : 
:3:hip_error.cpp            :27  : 15259418016 us: 25435: [tid:0x7fb6a9a2c880]  hipGetLastError (  ) 
...
Validating transposed matrix.
Validation failed with 16 errors.

Finally I found this discussion, Otherwise I presumably would not have tried to switch to O1. When compiling warp_shuffle/main.hip with O1, the problem disappears:

:3:hip_module.cpp           :469 : 15220897062 us: 25374: [tid:0x7fbdaf05d880]  hipLaunchKernel ( 0x2010a0, {1,1,1}, {4,4,1}, 0x7ffe6421a5d0, 0, stream:<null> ) 
:3:devprogram.cpp           :2676: 15220897181 us: 25374: [tid:0x7fbdaf05d880] Using Code Object V4.
:3:devprogram.cpp           :2979: 15220897374 us: 25374: [tid:0x7fbdaf05d880] For Init/Fini: Kernel Name: _Z23matrix_transpose_kernelPfPKfj
:4:command.cpp              :349 : 15220897384 us: 25374: [tid:0x7fbdaf05d880] Command (KernelExecution) enqueued: 0xef9460
:3:rocvirtual.cpp           :703 : 15220897389 us: 25374: [tid:0x7fbdaf05d880] Arg0:   = ptr:0x7fbca7201000 obj:[0x7fbca7201000-0x7fbca7201040]
:3:rocvirtual.cpp           :703 : 15220897390 us: 25374: [tid:0x7fbdaf05d880] Arg1:   = ptr:0x7fbca7200000 obj:[0x7fbca7200000-0x7fbca7200040]
:3:rocvirtual.cpp           :778 : 15220897391 us: 25374: [tid:0x7fbdaf05d880] Arg2:   = val:4
:3:rocvirtual.cpp           :2774: 15220897393 us: 25374: [tid:0x7fbdaf05d880] ShaderName : _Z23matrix_transpose_kernelPfPKfj
:4:rocvirtual.cpp           :862 : 15220897397 us: 25374: [tid:0x7fbdaf05d880] HWq=0x7fbdaf1d0000, Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=3, grid=[4, 4, 1], workgroup=[4, 4, 1], private_seg_size=0, group_seg_size=0, kernel_obj=0x7fbdaefca540, kernarg_address=0x7fbca6200000, completion_signal=0x0
:3:hip_error.cpp            :27  : 15220897403 us: 25374: [tid:0x7fbdaf05d880]  hipGetLastError (  ) 
....
Validating transposed matrix.
Validation passed.

goetzgaycken avatar Apr 14 '23 17:04 goetzgaycken

@drtpotter Can you please test with latest ROCm 6.0.2 (HIP 6.0.32831) to see if issue still occurs? Thanks!

ppanchad-amd avatar Apr 11 '24 14:04 ppanchad-amd

Hi! On GFX1035 I have given up trying to use the amdgpu-dkms driver because it messes around with my laptop display. This behaviour did not occur with ROCM 5.7. On GFX1035 I now just use HIP with the driver in the Linux kernel and that seems to work fine. With the open source kernel driver in kernel 6.5.0-25, GFX1035, and ROCM 6.0.2 I don't see the problem.

On GFX906 with ROCM 6.0.2 and the amdgpu-dkms driver the problem is still present.

hipcc -O0 mat_mult_bugreport.cpp -o a.out
./a.out
Maximum error (infinity norm) is: 24.3876
hipcc -O1 mat_mult_bugreport.cpp -o a.out
./a.out
Maximum error (infinity norm) is: 5.72205e-06

Feel free to incorporate this tool into any CI processes you have going!

drtpotter avatar Apr 11 '24 23:04 drtpotter

Hi, I am also getting this error on my Radeon RX 7800 XT with rocm 6.1.2

proof of concept, opti.cu:

#include <hip/hip_runtime.h>

inline __device__ void do_nothing(double *arr) { return; }

__global__ void kernel()
{
    double arr[2045];
    do_nothing(arr);
}

int main()
{
    kernel<<<22, 33>>>();
    hipError_t _ = hipDeviceSynchronize();
    return 0;
}

when I compile with -OO I get a segfault, but not -O1

> hipcc -O0 --offload-arch=gfx1101 opti.cu -o opti && ./opti
Memory access fault by GPU node-2 (Agent handle: 0x1f48310) on address 0x7f9b9fe00000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
> hipcc -O1 --offload-arch=gfx1101 opti.cu -o opti && ./opti
# no segfault
hipcc --version
HIP version: 6.1.40093-bd86f1708
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-6.1.2 24193 669db884972e769450470020c06a6f132a8a065b)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.1.2/llvm/bin
Configuration file: /opt/rocm-6.1.2/lib/llvm/bin/clang++.cfg

d3v-null avatar Jun 13 '24 02:06 d3v-null

That sounds like a compiler bug. Will check out the output from the clang version you provided. Will update this when I have something.

cjatin avatar Jun 13 '24 10:06 cjatin

The team can not seem to reproduce the issue internally.

cjatin avatar Jun 21 '24 08:06 cjatin

I can reproduce the optimisation dependent error with

HIP version: 5.7.31921-1949b1621
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.7.3 23382 f3e174a1d286158c06e4cc8276366b1d4bc0c914)
Target: x86_64-unknown-linux-gnu
Thread model: posix

and the compilation flags of

for ((i=0;i<4;i++))
do 
  hipcc -O${i} --offload-arch=gfx90a opti.cu -o opti-${i}
  ./opt-${i}  1> log.${i}.txt 2> errors.${i}.txt
done

But the oddness is that it is dependent on the array declaration in the code, specifically for smaller sizes of an array there is no issue.

__global__ void kernel()
{
    double arr[2045]; //sizes smaller that 2045 seem to work just fine. 
    do_nothing(arr);
}

Do you have ideas as to what the issue might be?

Can you perhaps provide the symbols from your builds (the output from nm)? It might be useful.

Cheers, Pascal

pelahi avatar Jun 25 '24 05:06 pelahi

Hi @drtpotter, I have used the mat_mult_bugreport.cpp.txt you provided on a Radeon Pro VII (gfx906) but was unable to reproduce the issue. Can you upgrade to ROCm 6.2 and check if the issue is still present?

sohaibnd avatar Oct 08 '24 14:10 sohaibnd

Sorry @sohaibnd the issue still remains on gfx906.

hipconfig
HIP version: 6.2.41133-dd7f95766

==hipconfig
HIP_PATH           :/opt/rocm-6.2.0
ROCM_PATH          :/opt/rocm-6.2.0
HIP_COMPILER       :clang
HIP_PLATFORM       :amd
HIP_RUNTIME        :rocclr
CPP_CONFIG         : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-6.2.0/include -I/include

==hip-clang
HIP_CLANG_PATH     :/opt/rocm-6.2.0/lib/llvm/bin
AMD clang version 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.2.0 24292 26466ce804ac523b398608f17388eb6d605a3f09)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.2.0/lib/llvm/bin
Configuration file: /opt/rocm-6.2.0/lib/llvm/bin/clang++.cfg
AMD LLVM version 18.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: tigerlake

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :
 -O3
hip-clang-ldflags :
--driver-mode=g++ -O3 --hip-link

== Environment Variables
PATH =/opt/rocm-6.2.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH=/opt/rocm-6.2.0/lib/llvm/lib:/opt/rocm-6.2.0/lib

== Linux Kernel
Hostname      :
workshop
Linux workshop 6.5.0-25-generic #25~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Feb 20 16:09:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy
dkms status
amdgpu/6.3.6-1718217.22.04, 6.5.0-25-generic, x86_64: installed
hipcc -O0 mat_mult_bugreport.cpp; ./a.out 
Maximum error (infinity norm) is: 24.3876

This is wrong. Then if I try

hipcc -O1 mat_mult_bugreport.cpp; ./a.out 
Maximum error (infinity norm) is: 5.72205e-06

This is correct.

drtpotter avatar Oct 09 '24 01:10 drtpotter

Is the issue still present on the gfx1035 card as well?

sohaibnd avatar Oct 09 '24 01:10 sohaibnd

Hey @sohaibnd, on gfx1035 (without amdgpu-dkms because it somehow messes with my screen?) I don't see the error,

hipcc -O0 mat_mult_bugreport.cpp; ./a.out 
Maximum error (infinity norm) is: 5.72205e-06
hipcc -O1 mat_mult_bugreport.cpp; ./a.out 
Maximum error (infinity norm) is: 5.72205e-06

which is good news! I still see the error on gfx906 though.

drtpotter avatar Oct 09 '24 01:10 drtpotter

Got it, thanks for your help in investigating this issue. While it is possible this may be a compiler bug specific to gfx906, support for gfx906 is now deprecated. I suggest compiling with -O1/-O2/-O3 as a workaround if you still want to use the gfx906 card.

I'm going to close this ticket but if you come across this issue on one of the supported GPUs, feel free to re-open it and we can look into it further.

sohaibnd avatar Oct 09 '24 21:10 sohaibnd