exllama updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )

Thank you for your work... as I've not seen this mentioned I thought I would post, in the hopes that this will save others frustration and support the work. I maintain this guide for AMD GPUs that use HIP / ROCm - https://github.com/nktice/AMD-AI

I've been trying to get the source code to compile and getting errors... I'll post that below, so that you can review it, but for those looking,

I'll first post what I have found as a workaround in the interim, as that version works, or at least compiles for me so far in testing... [ This may help others who are having such issues avoid frustration. ]

git clone https://github.com/turboderp/exllamav2 
cd exllamav2
git reset --hard a4ecea6
pip install .

Ok, and now here is the output of trying the latest version.. This is similar with a range of their drivers ( specific versions : 5.6.1, 5.7.3, 6.0 ) To reproduce, use the following commands ( output is below... )

git clone https://github.com/turboderp/exllamav2 
cd exllamav2
python setup.py build

Output is too long to paste inside the message, so it is attached : 2024-01-19-exllamav2-compile-error.txt

Jan 20 '24 04:01 nktice

Protip when getting those walls of compiler output is to copy everything into a text editor and search for the string : error:. In this case the errors are:

/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp: At global scope:
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp:267:16: error: expected initializer before ‘dec_lock’
  267 | void CUDART_CB dec_lock(hipStream_t stream, hipError_t status, void *user_data)
      |                ^~~~~~~~
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp: In member function ‘void STFile::load(at::Tensor, size_t, size_t, bool)’:
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp:328:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute ‘nodiscard’ [-Wunused-result]
  328 |             hipMemcpyAsync(dst, src, copy_len, hipMemcpyHostToDevice);
      |             ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.3/include/hip/hip_runtime_api.h:3883:12: note: in call to ‘hipError_t hipMemcpyAsync(void*, const void*, size_t, hipMemcpyKind, hipStream_t)’, declared here
 3883 | hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind,
      |            ^~~~~~~~~~~~~~
/opt/rocm-5.7.3/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
  332 | } hipError_t;
      |   ^~~~~~~~~~
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp:329:40: error: ‘dec_lock’ was not declared in this scope; did you mean ‘clock’?
  329 |             hipStreamAddCallback(NULL, dec_lock, (void*) page, 0);
      |                                        ^~~~~~~~
      |                                        clock

I have a 7900XTX on order so I can actually start running and debugging ROCm/HIP stuff myself soon.

But in the meantime I have to assume stream callbacks don't work exactly the same in HIPified CUDA code. GPT4 suggests that the CUDART_CB macro may not be needed. So if you wouldn't mind, you could try removing that word to see if it compiles.

So around line 266 in exllamav2/exllamav2_ext/cpp/safetensors.cpp you should have:

void dec_lock(cudaStream_t stream, cudaError_t status, void *user_data)
{
    #ifdef __linux__
    STPage* p = (STPage*) user_data;
    p->locks--;
    #endif
}

I'd love to hear if it works.

Jan 20 '24 05:01 turboderp

It appears that you added this to the code base - and it does work now! And thanks for the tip, I'll try to do that next time I have such issue.
[ I sent the whole thing because it did something odd at the beginning - there were some warnings about ignored packages ... that looks resolved now. ]

This test was using ROCm6.0 on Ubuntu 23.04 torch 2.3.0.20240118+rocm6.0 ... with flash attention 2. Model loads, and answers questions. :)

Jan 21 '24 00:01 nktice

Looks like this issue is currently marked open, when it should be closed.

Sep 04 '24 17:09 nktice

exllama exllama copied to clipboard

updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )

exllama
exllama copied to clipboard