exllama
exllama copied to clipboard
updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )
Thank you for your work... as I've not seen this mentioned I thought I would post, in the hopes that this will save others frustration and support the work. I maintain this guide for AMD GPUs that use HIP / ROCm - https://github.com/nktice/AMD-AI
I've been trying to get the source code to compile and getting errors... I'll post that below, so that you can review it, but for those looking,
I'll first post what I have found as a workaround in the interim, as that version works, or at least compiles for me so far in testing... [ This may help others who are having such issues avoid frustration. ]
git clone https://github.com/turboderp/exllamav2
cd exllamav2
git reset --hard a4ecea6
pip install .
Ok, and now here is the output of trying the latest version.. This is similar with a range of their drivers ( specific versions : 5.6.1, 5.7.3, 6.0 ) To reproduce, use the following commands ( output is below... )
git clone https://github.com/turboderp/exllamav2
cd exllamav2
python setup.py build
Output is too long to paste inside the message, so it is attached : 2024-01-19-exllamav2-compile-error.txt
Protip when getting those walls of compiler output is to copy everything into a text editor and search for the string : error:
. In this case the errors are:
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp: At global scope:
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp:267:16: error: expected initializer before ‘dec_lock’
267 | void CUDART_CB dec_lock(hipStream_t stream, hipError_t status, void *user_data)
| ^~~~~~~~
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp: In member function ‘void STFile::load(at::Tensor, size_t, size_t, bool)’:
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp:328:27: warning: ignoring returned value of type ‘hipError_t’, declared with attribute ‘nodiscard’ [-Wunused-result]
328 | hipMemcpyAsync(dst, src, copy_len, hipMemcpyHostToDevice);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-5.7.3/include/hip/hip_runtime_api.h:3883:12: note: in call to ‘hipError_t hipMemcpyAsync(void*, const void*, size_t, hipMemcpyKind, hipStream_t)’, declared here
3883 | hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind,
| ^~~~~~~~~~~~~~
/opt/rocm-5.7.3/include/hip/hip_runtime_api.h:332:3: note: ‘hipError_t’ declared here
332 | } hipError_t;
| ^~~~~~~~~~
/home/n/text-generation-webui/repositories/exllamav2/exllamav2/exllamav2_ext/cpp/safetensors_hip.cpp:329:40: error: ‘dec_lock’ was not declared in this scope; did you mean ‘clock’?
329 | hipStreamAddCallback(NULL, dec_lock, (void*) page, 0);
| ^~~~~~~~
| clock
I have a 7900XTX on order so I can actually start running and debugging ROCm/HIP stuff myself soon.
But in the meantime I have to assume stream callbacks don't work exactly the same in HIPified CUDA code. GPT4 suggests that the CUDART_CB
macro may not be needed. So if you wouldn't mind, you could try removing that word to see if it compiles.
So around line 266 in exllamav2/exllamav2_ext/cpp/safetensors.cpp you should have:
void dec_lock(cudaStream_t stream, cudaError_t status, void *user_data)
{
#ifdef __linux__
STPage* p = (STPage*) user_data;
p->locks--;
#endif
}
I'd love to hear if it works.
It appears that you added this to the code base - and it does work now!
And thanks for the tip, I'll try to do that next time I have such issue.
[ I sent the whole thing because it did something odd at the beginning -
there were some warnings about ignored packages ... that looks resolved now. ]
This test was using ROCm6.0 on Ubuntu 23.04 torch 2.3.0.20240118+rocm6.0 ... with flash attention 2. Model loads, and answers questions. :)
Looks like this issue is currently marked open, when it should be closed.