exllama
exllama copied to clipboard
Tried to build setup exllama but encountering ninja related errors, can someone please help me?
Hello everyone
Im trying to setup exllama in an Azure ML compute and I followed the instructions here https://github.com/turboderp/exllama, but unfortunately Im getting an error when trying to call this as based from the setup instructions.
python test_benchmark_inference.py -d <path_to_model_files> -p -ppl
I've been trying to fix the error, but unfortunately, I wasnt able to. I hope someone can point me to the right direction.
Here are some of the parts of the error message,but the complete error is much, much longer
Thank you and looking forward to fix this issue.
Traceback (most recent call last):
File "/anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/anaconda/envs/exllamav2/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
...
RuntimeError: Error building extension 'exllama_ext': [1/12] c++ -MMD -MF exllama_ext.o.d -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/batch/tasks/shared/LS_root/mounts/clusters/vm-nc48ads-a100-v4/code/Users/xxxxxxx.xxxxxxx/Sprint 114/exllama/exllama_ext -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/TH -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /anaconda/envs/exllamav2/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c '/mnt/batch/tasks/shared/LS_root/mounts/clusters/vm-nc48ads-a100-v4/code/Users/xxxxxxx.xxxxxxx/Sprint 114/exllama/exllama_ext/exllama_ext.cpp' -o exllama_ext.o
FAILED: exllama_ext.o
c++ -MMD -MF exllama_ext.o.d -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/batch/tasks/shared/LS_root/mounts/clusters/vm-nc48ads-a100-v4/code/Users/xxxxxxx.xxxxxxx/Sprint 114/exllama/exllama_ext -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/TH -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /anaconda/envs/exllamav2/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c '/mnt/batch/tasks/shared/LS_root/mounts/clusters/vm-nc48ads-a100-v4/code/Users/xxxxxxx.xxxxxxx/Sprint 114/exllama/exllama_ext/exllama_ext.cpp' -o exllama_ext.o
c++: error: 114/exllama/exllama_ext: No such file or directory
Here are the compute's details: https://learn.microsoft.com/en-us/azure/virtual-machines/nc-a100-v4-series
- vCPU: 48
- Memory: 440 GB
- Temp Disk : 128 GB
- GPU: 2 x A100 (2x 80GB VRAM)
Notes:
- The ML compute is brand new/ newly spun up.
- I am not a Linux/ Ubuntu pro, I have some knowledge but need some help when it comes to these problems
There is space char in the path to exllama_ext
Try fixing the header files for python. This helped for me: here
@BwandoWando did you find a workaround?