exllama
exllama copied to clipboard
Compiling issue on Sagemaker
Have anyone had sucsess compiling on SageMaker? There is probably a lot more for me to explore, but just wanted to check if anyone has faced the same issues
I tried loading up the standard 3.10 python image on ml.g4dn.xlarge (Tesla T4)
then do
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
!git clone https://github.com/turboderp/exllama
!pip install -r exllama/requirements.txt
!python exllama/test_benchmark_inference.py -d ./Combined3b -p -ppl
The error I get is `Successfully preprocessed all matching files. Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "/usr/local/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/exllama/test_benchmark_inference.py", line 1, in
I'm not familiar with Sagemaker but subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
implies that the ninja build tool isn't installed
If you add pip install ninja
does it make things work?
I'll give it a try tomorrow and report back.. Thank you for the suggestion
Ok, so i got ninja, but no build.ninja `ninja -v
ninja: error: loading 'build.ninja': No such file or directory
ninja --version
1.11.1.git.kitware.jobserver-1 `
I'll keep on digging when I have some time off Edit: By the way, I tried running it locally, and it worked without any additional configuration, except for the limitation of my 4GB GPU, of course. It seems that the problem might be related to a permissions issue with the torch folder I have on Sagemaker.
Is your sagemaker instance running linux?
Maybe you need to also include:
!sudo apt-get install -y ninja-build
Unsure why you are using the rocm torch version when you are using an nvidia tesla T4, but try using the normal version.
Is your sagemaker instance running linux?
Maybe you need to also include:
!sudo apt-get install -y ninja-build
Installed ninja-build, but didnt solve the issue
Unsure why you are using the rocm torch version when you are using an nvidia tesla T4, but try using the normal version.
Worth a try. In my experience, setting up CUDA to work on Sagemaker can be a bit tricky, so I tend to go for the pre-buildt GPU optimized images.
Thanks for helping out
No problem, not sure what the other guy meant with ninja... the fact that you're getting an error code and message from ninja means you have it installed at the very least. As far as I know, ROCm torch isn't meant for nvidia cards. Nvidia cards get their own special treatment with cuda :)
Closing the issue for now, since its an run enviorment issue. Ill update with the solution if I get around to fix it.