exllama icon indicating copy to clipboard operation
exllama copied to clipboard

Tried to build setup exllama but encountering ninja related errors, can someone please help me?

Open BwandoWando opened this issue 2 years ago • 3 comments

Hello everyone

Im trying to setup exllama in an Azure ML compute and I followed the instructions here https://github.com/turboderp/exllama, but unfortunately Im getting an error when trying to call this as based from the setup instructions.

python test_benchmark_inference.py -d <path_to_model_files> -p -ppl

I've been trying to fix the error, but unfortunately, I wasnt able to. I hope someone can point me to the right direction.

Here are some of the parts of the error message,but the complete error is much, much longer

Thank you and looking forward to fix this issue.

Traceback (most recent call last):
  File "/anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "/anaconda/envs/exllamav2/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

...

RuntimeError: Error building extension 'exllama_ext': [1/12] c++ -MMD -MF exllama_ext.o.d -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/batch/tasks/shared/LS_root/mounts/clusters/vm-nc48ads-a100-v4/code/Users/xxxxxxx.xxxxxxx/Sprint 114/exllama/exllama_ext -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/TH -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /anaconda/envs/exllamav2/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c '/mnt/batch/tasks/shared/LS_root/mounts/clusters/vm-nc48ads-a100-v4/code/Users/xxxxxxx.xxxxxxx/Sprint 114/exllama/exllama_ext/exllama_ext.cpp' -o exllama_ext.o 
FAILED: exllama_ext.o 
c++ -MMD -MF exllama_ext.o.d -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/mnt/batch/tasks/shared/LS_root/mounts/clusters/vm-nc48ads-a100-v4/code/Users/xxxxxxx.xxxxxxx/Sprint 114/exllama/exllama_ext -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/TH -isystem /anaconda/envs/exllamav2/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /anaconda/envs/exllamav2/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c '/mnt/batch/tasks/shared/LS_root/mounts/clusters/vm-nc48ads-a100-v4/code/Users/xxxxxxx.xxxxxxx/Sprint 114/exllama/exllama_ext/exllama_ext.cpp' -o exllama_ext.o 
c++: error: 114/exllama/exllama_ext: No such file or directory

Here are the compute's details: https://learn.microsoft.com/en-us/azure/virtual-machines/nc-a100-v4-series

  • vCPU: 48
  • Memory: 440 GB
  • Temp Disk : 128 GB
  • GPU: 2 x A100 (2x 80GB VRAM)

Notes:

  • The ML compute is brand new/ newly spun up.
  • I am not a Linux/ Ubuntu pro, I have some knowledge but need some help when it comes to these problems

BwandoWando avatar Aug 22 '23 05:08 BwandoWando

There is space char in the path to exllama_ext

evg-tyurin avatar Sep 04 '23 08:09 evg-tyurin

Try fixing the header files for python. This helped for me: here

guialfaro053 avatar Sep 13 '23 02:09 guialfaro053

@BwandoWando did you find a workaround?

qcapista avatar Sep 25 '23 01:09 qcapista