MIOpen v_mac in gfx10 architectures not supported

gfx10 does not support the below instructions present in v4r1 and other kernels:

In file included from gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp:1:
In file included from ./common_header.hpp:22:
./amd_inline_asm.hpp:24:21: error: instruction not supported on this GPU
    asm volatile("\n \
                    ^
<inline asm>:2:14: note: instantiated into assembly here
             v_mac_f32 v65, v95, v99
             ^
In file included from gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp:1:
In file included from ./common_header.hpp:22:
./amd_inline_asm.hpp:25:36: error: instruction not supported on this GPU
            v_mac_f32 %0, %4, %5 \n \
                                   ^
<inline asm>:3:14: note: instantiated into assembly here
             v_mac_f32 v63, v95, v100
             ^
In file included from gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp:1:
In file included from ./common_header.hpp:22:
./amd_inline_asm.hpp:26:36: error: instruction not supported on this GPU
            v_mac_f32 %1, %4, %6 \n \
                                   ^
<inline asm>:4:14: note: instantiated into assembly here
             v_mac_f32 v62, v95, v101
             ^
In file included from gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp:1:
In file included from ./common_header.hpp:22:
./amd_inline_asm.hpp:27:36: error: instruction not supported on this GPU
            v_mac_f32 %2, %4, %7 \n \
                                   ^
<inline asm>:5:14: note: instantiated into assembly here
             v_mac_f32 v61, v95, v102
             ^
In file included from gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp:1:
In file included from ./common_header.hpp:22:
./amd_inline_asm.hpp:24:21: error: instruction not supported on this GPU
    asm volatile("\n \
                    ^
<inline asm>:2:14: note: instantiated into assembly here
             v_mac_f32 v56, v96, v99
             ^
In file included from gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp:1:
In file included from ./common_header.hpp:22:
./amd_inline_asm.hpp:25:36: error: instruction not supported on this GPU
            v_mac_f32 %0, %4, %5 \n \
                                   ^
<inline asm>:3:14: note: instantiated into assembly here
             v_mac_f32 v55, v96, v100
             ^
In file included from gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp:1:
In file included from ./common_header.hpp:22:
./amd_inline_asm.hpp:26:36: error: instruction not supported on this GPU
            v_mac_f32 %1, %4, %6 \n \
                                   ^
<inline asm>:4:14: note: instantiated into assembly here
             v_mac_f32 v54, v96, v101
             ^
In file included from gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp:1:
In file included from ./common_header.hpp:22:
./amd_inline_asm.hpp:27:36: error: instruction not supported on this GPU
            v_mac_f32 %2, %4, %7 \n \
                                   ^
<inline asm>:5:14: note: instantiated into assembly here
             v_mac_f32 v53, v96, v102

Oct 05 '20 17:10 daniellowell

Hi @daniellowell v_mac_f32 should be a valid instruction in gfx10 ISA, I just tested in assembler level: echo "v_mac_f32 v2, v3, v4" | /opt/rocm/llvm/bin/llvm-mc -arch=amdgcn -mcpu=gfx1000 -show-encoding -show-inst on our rocm3.7

Maybe the flag not proper to compile gfx10?

Oct 08 '20 02:10 carlushuang

IIRC VOP2 v_mac_v32 should be valid for all gfx10 parts, see https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX10.html. Please inform Dmitry Preobrazhensky if assembler has issues with it.

Maybe there is an issue in the high-level compiler (that can be considered as an intermediate layer between inline assembly code and llvm-mc layer). If something is wrong, please open Jira ticket.

Oct 08 '20 15:10 atamazov

If there is a compiler or assembler issue, it is possible to use v_mad_f32 or v_fmac_f32 as a workaround, I think.

Oct 08 '20 15:10 atamazov

Ah, gfx1030 should have deprecated both v_mad_f32 and v_mac_f32, need use v_fmac_f32 instead

Oct 10 '20 01:10 carlushuang

@ltqin Please give me an ETA on when you think this can be completed.

Oct 12 '20 16:10 daniellowell

I recommend extending inst_wrappers.inc with _v_mac_f32 macro and using it in the kernels.

Oct 12 '20 22:10 atamazov

@daniellowell The task seems simple, but there are some strange problems in the test. If the task is not urgent, I will finish it by November 15th. Is that ok?

Oct 15 '20 09:10 ltqin

Directly using v_fmac_f32 replaces v_mac_f32,it can be compiled on gfx1030, but the running results can not be verified, and the hip version of fp32 also fails to pass the verification. but fp16 is correct. Next I will confirm whether there is a problem with the installation environment.

Oct 16 '20 00:10 ltqin

but the running results can not be verified

Kernels with inline v_fmac_f32 fail verification?

the hip version of fp32 also fails...

Do you mean "kernels without inline assembly code"?

Oct 16 '20 14:10 atamazov

but the running results can not be verified

Kernels with inline v_fmac_f32 fail verification?

YES

the hip version of fp32 also fails...

Do you mean "kernels without inline assembly code"?

YES

Oct 17 '20 15:10 ltqin

the hip version of fp32 also fails...

Do you mean "kernels without inline assembly code"?

YES

Then there is general HIP compilation problem.

Most likely, the v_mac_f32 -> v_fmac_f32 substituion is correct. Actually, it can't be incorrect, except cases that VERY sensitive to precision. Which is not the case for convolutions. Please go ahead with v_mac_f32 -> v_fmac_f32 for gfx10.

Oct 17 '20 22:10 atamazov

@ltqin [off-topic][githib formatting] Please use one empty line after citation mark, otherwise formatting will be incorrect. Valid:

>> Coala eats
>
> shoots

and leaves.

Incorrect:

>> Coala eats
> shoots
and leaves.

Oct 17 '20 22:10 atamazov

@atamazov Okay, I got it

Oct 19 '20 08:10 ltqin

when set the flag "CK_USE_AMD_BUFFER_ADDRESSING" to zero, the test pass (both with inline v_fmac_f32 and without inline assembly code). does "amdgcn_buffer_load_f32X" not work for gfx1030.

Oct 19 '20 08:10 ltqin

@ltqin Please disable buffer_load for gfx1030.

Oct 19 '20 16:10 asroy

@asroy

Please disable buffer_load for gfx1030.

This should be workaround (due to compiler issues), because buffer insns are supported on gfx1030, right?

Oct 19 '20 17:10 atamazov

@ltqin Is this still an issue with ROCm 6.1.1? If not, can we close the bug? Thanks!

Mar 17 '24 03:03 ppanchad-amd