ctransformers [AMD] Fix compilation issue with ROCm

Problem: Unable to install the package on a Linux machine with an AMD 6800XT GPU using ROCm.

docker run -it --device=/dev/kfd --device=/dev/dri --group-add video docker.io/rocm/dev-ubuntu-22.04:5.6-complete bash

CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers

# Compiling after setting CC and CXX env variables also failed with a similar error.
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers

Error logs: https://gist.github.com/bhargav/7f8c2984ba32ff99ce8e93433d9059a6

Solution: Failures are due to references to CUDA library imports instead of using the HIP versions when compiled for AMD.

Verified that the project can build with the fixes.

apt-get update && apt-get install -y git

git clone https://github.com/bhargav/ctransformers.git
cd ctransformers

git checkout bhargav/fix_rocm_compile

CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CT_HIPBLAS=1 pip install .

Build log: https://gist.github.com/bhargav/65bbbd039bda6f39504448656e88ab6b

Package installs successfully and I was able to run a model inference on GPU.

Sep 17 '23 18:09 bhargav

i can confirm. Compile without a issue now. (rocm nightly and old vega64 :)

Sep 20 '23 11:09 mega-ice

Hello,

Installing done without error but when trying to prompt the model using

llm = AutoModelForCausalLM.from_pretrained("/models/llama-7b.Q3_K_M.gguf", model_type="llama" , local_files_only=True, gpu_layers=100)

print(llm("AI is going to"))

exits with error

CUDA error 98 at ~/ctransformers/models/ggml/ggml-cuda.cu:6045: invalid device function

Oct 11 '23 00:10 ahmedashraf093

When using ROCm, "CUDA error 98 ....... invalid device function" (as far as I know) usually means version implementation problems in the HIP stack. Most likely it's solvable with

export HSA_OVERRIDE_GFX_VERSION=11.0.0 for rdna3 gpu export HSA_OVERRIDE_GFX_VERSION=10.3.0 for older one

Oct 14 '23 08:10 mega-ice

Hi, In the referenced container version it can work with a gfx1030 system. docker.io/rocm/dev-ubuntu-22.04:5.6-complete

apt show rocm-libs -a

Package: rocm-libs
Version: 5.6.0.50600-67~22.04
Priority: optional
Section: devel

But in "rocm/pytorch:latest-release"

apt show rocm-libs -a
Package: rocm-libs
Version: 5.7.0.50700-63~20.04
Priority: optional

Raises the" invalid device function" Error.

NOTE: If you import the cloned directory by accident instead of the installed package: OSError: libcudart.so.12: cannot open shared object file: No such file or directory

NOTE: I disabled the integrated gfx1036 GPU from the CPU, so I have only the gfx1030 system.

I assume the "invalid device function" depends on the environment's library version(s).

/lgtm

Nov 26 '23 15:11 afazekas

CUDA error 98 at /home/gingi/github/ctransformers/models/ggml/ggml-cuda.cu:6045: invalid device function

I have export HSA_OVERRIDE_GFX_VERSION=11.0.0 and running HSA_OVERRIDE_GFX_VERSION=11.0.0 python index.py

Dec 14 '23 21:12 0xGingi

the below command worked well！ CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DLLAMA_CLBLAST=on -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_PREFIX_PATH=/opt/rocm" FORCE_CMAKE=1 pip install ctransformers --no-binary ctransformers

Mar 20 '24 03:03 huotianyu