hipBLASLt Build Failure During Tensile Libraries Generation

Local ROCm version: 5.2.5.1 hipBLASLt version used in build: release/rocm-rel-5.5 Python version: 3.10 CPU: POWER9 GPU: gfx906

The hipBLASLt requirement arose for us re: bitsandbytes-rocm/ops.cu:400 that is required for 8-bit loading of HuggingFace language models. Unfortunately, the current implementation seems to rely on hipBLASLt for 8-bit matmul, and lacks in 4-bit implementation. Would you say that for gfx906/gfx908, hipBLASLt provides an advantage in 8-bit or 4-bit inference compared to hipBLAS code?

During the build process, the following commands were used:

CMake command: `cmake -DAMDGPU_TARGETS=gfx906 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_C_COMPILER=hipcc -G "Unix Makefiles" ..`
Make command: `make -j16`

CMake did not report any errors. However, the build failed at the "Generating Tensile Libraries" target, immediately after displaying the message "Reading logic files: Launching 32 threads...". The build failure persists even when configuring using install.sh with AMDGPU_TARGETS hardcoded to gfx906.

traceback:

cmake.log make.log

rocminfo: rocminfo.txt

Update: Seems the same error appears when compiling with ROCm 5.5.

May 21 '23 12:05 hovertank3d

@hovertank3d hipBLASLt only support gfx90a so far. You can find the supported data types and hw requirement from Readme.

Jun 13 '23 06:06 jichangjichang

@hovertank3d Please check if your issue still occurs with the latest ROCm 6.1.2? If not, please close the ticket. Thanks!

Jul 09 '24 14:07 ppanchad-amd

@hovertank3d Closing ticket. Please feel free to re-open ticket if you need assistance. Thanks!

Oct 23 '24 19:10 ppanchad-amd