Tensile icon indicating copy to clipboard operation
Tensile copied to clipboard

Basic build for gfx1010

Open mihirparadkar opened this issue 5 years ago • 1 comments

I'm trying to build this library so that I can link it to rocBLAS and have a functioning gemm implementation in HIP (similarly to this PR) for gfx1010/gfx1012.

I saw a commit titled Kernels now working on gfx1010. and I'm wondering what set of environment variables and compilers is needed to accomplish this.

I'm using Linux Mint 19.3 with ROCm 3.3.0 on a RX 5500 XT (gfx1012) and RX 5700 XT (gfx1010). Host processor is Ryzen 9 3900X.

I first tried python3 ../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_asm_only.yaml ./ Tensile_default_output.TXT.zip

Most notably, I don't think that HCC supports gfx1010 and up, so I see a lot of compiler errors that look like

'['/opt/rocm/bin/hcc', '-x', 'assembler', '-target', 'amdgcn-amd-amdhsa', '-mno-code-object-v3', '-mcpu=gfx1010', '-mwavefrontsize64', '-c', '-o', '/home/mihir/Programs/TensileOcl/Tensile/build_hip/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_BF/sourceTmp/assembly/Cijk_Ailk_Bljk_SB_MT64x32x32_SE_AMAS0_EPS1_GRVW1_K1_PGR1_TT8_4_VW1_WG8_8_4_WGM1.o', '/home/mihir/Programs/TensileOcl/Tensile/build_hip/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_BF/sourceTmp/assembly/Cijk_Ailk_Bljk_SB_MT64x32x32_SE_AMAS0_EPS1_GRVW1_K1_PGR1_TT8_4_VW1_WG8_8_4_WGM1.s']' returned non-zero exit status 1.
/home/mihir/Programs/TensileOcl/Tensile/build_hip/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_BF/sourceTmp/assembly/Cijk_Ailk_Bljk_SB_MT32x32x32_SE_AMAS0_EPS0_GRVW1_K1_PGR0_TT4_4_VW1_WG8_8_4_WGM8.s:1461:1: /home/mihir/Programs/TensileOcl/Tensile/build_hip/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_BF/sourceTmp/assembly/Cijk_Ailk_Bljk_SB_MT32x64x32_SE_AMAS0_EPS1_GRVW1_K1_PGR1_TT4_8_VW1_WG8_8_4_WGM1.s:2346:1: error: instruction not supported on this GPU

Additionally, the generated run.sh script tries to set the graphics card clock to an invalid value.

+ /opt/rocm/bin/rocm-smi -d 0 --setfan 255 --setsclk 7
[sudo] password for mihir:          


========================ROCm System Management Interface========================
ERROR: GPU[0] 		: Unable to set clock level
ERROR: GPU[0]	: Max clock level is 2
GPU[0] 		: Successfully set fan control to 'manual'
GPU[0] 		: Successfully set fan speed to Level 255

I also tried setting --cxx-compiler to hipcc with $HIP_PLATFORM set to clang but CMake-generated flags include -hc which isn't recognized by the compiler. I could get a little further by editing TensileCreateLibrary.py to pass in -D__HIP_VDI__, manually editing the generated flags.make in the build files, and setting the benchmark config to not build a new client, but I inevitably run into one of the above issues.

Is there a recommended way to directly build a basic gemm kernel in pure HIP that can be used by rocBLAS without using the benchmarking driver program?

mihirparadkar avatar Apr 14 '20 06:04 mihirparadkar

@mihirparadkar Apologies for the lack of response. Can you please check if you are seeing the same issue with the latest ROCm 6.1.2? Thanks!

ppanchad-amd avatar Jul 09 '24 19:07 ppanchad-amd

@mihirparadkar Closing ticket. Please re-open the ticket if you still see the issue with the latest ROCm. Thanks!

ppanchad-amd avatar Sep 16 '24 19:09 ppanchad-amd