RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist
The below error occured while trying to convert model to gguf format.
I noticed that quantized folder resides in llama.cpp/examples/quantize
RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?
# Save to q4_k_m GGUF
if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
Weird I just tried it in the last hour and it works
It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth https://github.com/ggerganov/llama.cpp/issues/8107 if you run make command in llama.cpp folder it will work
Weird it stopped working? Hmm I shall try this in Colab and report back!
I have the same problem. Is there a solution now?
RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist. But we expect this file to exist! Maybe the llama.cpp developers changed the name?
It should function - are you using Colab?
It should function - are you using Colab?
Well, mine is as follows: NVIDIA V100 Driver Version: 535.146.02 CUDA Version: 12.1
I temporarily solved this problem by rolling back llama.cpp
cd llama.cpp
git checkout b3345
git submodule update --init --recursive
make clean
make all -j
git log -1
@danielhanchen Yes, I am using colab, but I am still having the same error.
Wait weird I just ran it with no errors in Colab - it's best to use our updated notebooks on our Github and start a fresh
@Zhangy-ly That is an effective workaround.
cd llama.cppgit checkout b3345git submodule update --init --recursivemake cleanmake all -jgit log -1
@Zhangy-ly That is an effective workaround.
cd llama.cppgit checkout b3345git submodule update --init --recursivemake cleanmake all -jgit log -1
To anyone having error while using those bash commands, use: ! before each command
Wait so the issue persists? Are people using Colab / Runpod?
Wait so the issue persists? Are people using Colab / Runpod?
Hi Daniel,
Thank you for your response.
To clarify, the issue persists on my Ubuntu setup, although it seems to run without problems on Colab. Is there any other information you need to help diagnose the issue? plz tell me.
Ubuntu, NVIDIA V100, Driver Version: 535.146.02, CUDA Version: 12.1
packages in environment:
Name Version
_libgcc_mutex 0.1
_openmp_mutex 5.1
accelerate 0.32.1
aiohttp 3.9.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.2.0
bitsandbytes 0.43.2
blas 1.0
brotli-python 1.0.9
bzip2 1.0.8
ca-certificates 2024.3.11
certifi 2024.7.4
charset-normalizer 2.0.4
cuda-cudart 12.1.105
cuda-cupti 12.1.105
cuda-libraries 12.1.0
cuda-nvrtc 12.1.105
cuda-nvtx 12.1.105
cuda-opencl 12.5.39
cuda-runtime 12.1.0
cuda-version 12.5
datasets 2.20.0
dill 0.3.8
docstring-parser 0.16
ffmpeg 4.3
filelock 3.13.1
freetype 2.12.1
frozenlist 1.4.1
fsspec 2024.2.0
gguf 0.9.1
gmp 6.2.1
gmpy2 2.1.2
gnutls 3.6.15
huggingface-hub 0.23.4
idna 3.7
intel-openmp 2023.1.0
jinja2 3.1.3
jpeg 9e
lame 3.100
lcms2 2.12
ld_impl_linux-64 2.38
lerc 3.0
libcublas 12.1.0.26
libcufft 11.0.2.4
libcufile 1.10.1.7
libcurand 10.3.6.82
libcusolver 11.4.4.55
libcusparse 12.0.2.55
libdeflate 1.17
libffi 3.4.4
libgcc-ng 11.2.0
libgomp 11.2.0
libiconv 1.16
libidn2 2.3.4
libjpeg-turbo 2.0.0
libnpp 12.0.2.50
libnvjitlink 12.1.105
libnvjpeg 12.1.1.14
libpng 1.6.39
libstdcxx-ng 11.2.0
libtasn1 4.19.0
libtiff 4.5.1
libunistring 0.9.10
libuuid 1.41.5
libwebp-base 1.3.2
llvm-openmp 14.0.6
lz4-c 1.9.4
markdown-it-py 3.0.0
markupsafe 2.1.5
mdurl 0.1.2
mkl 2023.1.0
mkl-service 2.4.0
mkl_fft 1.3.8
mkl_random 1.2.4
mpc 1.1.0
mpfr 4.0.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
ncurses 6.4
nettle 3.7.3
networkx 3.2.1
numpy 1.26.4
numpy-base 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.1.105
nvidia-nvtx-cu12 12.1.105
openh264 2.1.1
openjpeg 2.4.0
openssl 3.0.14
packaging 24.1
pandas 2.2.2
peft 0.11.1
pillow 10.3.0
pip 24.0
protobuf 3.20.3
psutil 6.0.0
pyarrow 16.1.0
pyarrow-hotfix 0.6
pygments 2.18.0
pysocks 1.7.1
python 3.10.13
python-dateutil 2.9.0.post0
pytorch-cuda 12.1
pytorch-mutex 1.0
pytz 2024.1
pyyaml 6.0.1
readline 8.2
regex 2024.5.15
requests 2.32.2
rich 13.7.1
safetensors 0.4.3
sentencepiece 0.2.0
setuptools 69.5.1
shtab 1.7.1
six 1.16.0
sqlite 3.45.3
sympy 1.12
tbb 2021.8.0
tk 8.6.14
tokenizers 0.19.1
torch 2.2.0+cu121
torchaudio 2.2.0
torchvision 0.17.0
tqdm 4.66.4
transformers 4.43.1
triton 2.2.0
trl 0.8.6
typing-extensions 4.9.0
tyro 0.8.5
tzdata 2024.1
unsloth 2024.7
urllib3 2.2.2
wheel 0.43.0
xformers 0.0.24
xxhash 3.4.1
xz 5.4.6
yaml 0.2.5
yarl 1.9.4
zlib 1.2.13
zstd 1.5.5
@Zhangy-ly That is an effective workaround.
cd llama.cppgit checkout b3345git submodule update --init --recursivemake cleanmake all -jgit log -1
solved my situation. There are no llama-quantize and quantize file in the newest git source(08/07/2024). So, unslothai should install the specific version of llama.cpp to fix this issue. Thank you! ;)
It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth ggerganov/llama.cpp#8107 if you run make command in llama.cpp folder it will work
Same problem here. This tip solved the issue.
$ cd llama.cpp
make
manually make works. it generates llama.cpp/llama-quantize
$ cd llama.cpp
make
It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth ggerganov/llama.cpp#8107 if you run make command in llama.cpp folder it will work
Same problem here. This tip solved the issue.
$ cd llama.cpp make
Hmm I might have to re-take a look why it's not working - maybe my calling mechanisms aren't functioning correctly
On windows I need to remove the extension llama-quantize.exe
and then
%}{{'<|im_start|>user
' + message['content'] + '<|im_end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|im_start|>assistant
' + message['content'] + '<|im_end|>
' }}{% else %}{{ '<|im_start|>system
' + message['content'] + '<|im_end|>
' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}{% else %}{% for message in messages %}{% if message['from'] == 'human' %}{{'<|im_start|>user
' + message['value'] + '<|im_end|>
'}}{% elif message['from'] == 'gpt' %}{{'<|im_start|>assistant
' + message['value'] + '<|im_end|>
' }}{% else %}{{ '<|im_start|>system
' + message['value'] + '<|im_end|>
' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}{% endif %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:model\unsloth.BF16.gguf: n_tensors = 339, total_size = 15.2G
Writing: 100%|██████████| 15.2G/15.2G [01:00<00:00, 250Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to model\unsloth.BF16.gguf
Unsloth: Conversion completed! Output location: ./model/unsloth.BF16.gguf
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This will take 20 minutes...
'.' is not recognized as an internal or external command,
operable program or batch file.
RuntimeError: Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.
A bit of a noob here, but I have a workaround. I had built llama.cpp with VS2022 using cmake. I had a llama.cpp\bin\Releases with the resulting dll and exe files, which unsloth couldn't find. Simply copying that whole folder to llama.cpp\llama-quantize worked. I was initially confused as to what exactly unsloth was looking for.
Sorry on the issues on llama.cpp :( I might actually make a section with exact details on how to do llama.cpp properly
Same for me
File ~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:975, in save_to_gguf(model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, _run_installer)
[973](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:973) quantize_location = "llama.cpp/llama-quantize"
[974](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:974) else:
--> [975](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:975) raise RuntimeError(
[976](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:976) "Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.\n"\
[977](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:977) "But we expect this file to exist! Maybe the llama.cpp developers changed the name?"
[978](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:978) )
...
[981](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:981) # See https://github.com/unslothai/unsloth/pull/730
[982](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:982) # Filenames changed again!
RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?
@Antonytm Would https://github.com/unslothai/unsloth/wiki#manually-saving-to-gguf be helpful? Sorry on the delay!
@danielhanchen yes! It works. 👍
I tried building the same with cmake but exe's and dll's are not getting generated. I have manually copied the dll's and exe's from the release builds but I get the same issue. I then converted the model to gguf manually
python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16
mo del file gets generated but on creating this model with ollama from the gguf file I get the following error
`C:\Users\Desktop\New folder>ollama create unsloth_m -f "C:\Users\Desktop\New folder\op.gguf"
Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"`
Please help.
git log -1
save me work,thanks
I tried building the same with cmake but exe's and dll's are not getting generated. I have manually copied the dll's and exe's from the release builds but I get the same issue. I then converted the model to gguf manually
python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16mo del file gets generated but on creating this model with ollama from the gguf file I get the following error
`C:\Users\Desktop\New folder>ollama create unsloth_m -f "C:\Users\Desktop\New folder\op.gguf"
Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"`
Please help.
seems there is someting wrong with your modelfile , usually on the top is FROM model_name.gguf
@jainpradeep Windows right? Also apologies on the delay - Modelfile should look like https://github.com/ollama/ollama/blob/main/docs/modelfile.md and Windows building for llama.cpp can be tough - see https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md
I was planning to add more stable support for Windows in the future
The solution provided by @Zhangy-ly checking out llama.cpp branch doesn't seem to work anymore. I used cmake
as advised in the updated build documentation
git checkout b3345 git submodule update --init --recursive cmake -B build cmake --build build --config Release git log -1
CMake generates an out-of-source build by default, meaning the build artifacts (compiled binaries, etc.) are placed in a separate build folder (e.g., build/Release) instead of the source folder (llama.cpp). I copied all the binaries in the build folder to the root folder and re-ran the unsloth Llama_3_2_1B+3B_Conversational_+_2x_faster_finetuning collab. But still I get the same
RuntimeError( "Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.\n"\ "But we expect this file to exist! Maybe the llama.cpp developers changed the name?" )
As a alternate workaround I tried converting the model to gguf manually using
python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16 but the generated outputfile dosent work with ollama.
`Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"``
Merged model files as suggested by @danielhanchen are in order and the config and safe tensor files are present in the folder and there are no errors while generating the merged model.
Can someone please suggest me how I can use the model in ollama without converting it to gguf. I have been trying this to work since 1 month. There were many issues related to corporate proxy, SSL issues, Timeout issues, issues due to dependency versions, issues for building llama.cpp (I tried make, cmake, ninja, vs2022 I have tried everything) but I am stuck on the final step for the model to work with ollama to use it in openweb-ui.
Please suggest what am I doing wrong?
I got this issue on ubuntu, and the following steps worked for me.
- Build manually with cmake (It seems that make does not work anymore.), following the llama.cpp build instruction.
- The above creates executable files (llama-*) under llama.cpp/build/bin. Copy them over to directly under llama.cpp/.
- Re-run the failed unsloth call. (In my case, push_to_hub_gguf.)
I got this issue on ubuntu, and the following steps worked for me.
- Build manually with cmake (It seems that make does not work anymore.), following the llama.cpp build instruction.
- The above creates executable files (llama-*) under llama.cpp/build/bin. Copy them over to directly under llama.cpp/.
- Re-run the failed unsloth call. (In my case, push_to_hub_gguf.)
It works for me.
In Collab env i used
!(cd llama.cpp; cmake -B build;cmake --build build --config Release)
Then i copied the executable to /content/llama.cpp directory with cp then i re-ran the celd.
I got this issue on ubuntu, and the following steps worked for me.
- Build manually with cmake (It seems that make does not work anymore.), following the llama.cpp build instruction.
- The above creates executable files (llama-*) under llama.cpp/build/bin. Copy them over to directly under llama.cpp/.
- Re-run the failed unsloth call. (In my case, push_to_hub_gguf.)
It works for me.
In Collab env i used
!(cd llama.cpp; cmake -B build;cmake --build build --config Release)Then i copied the executable to /content/llama.cpp directory with cp then i re-ran the celd.
Thanks. Same problem, following this instruction and solved. Seems that the llama.cpp compiling logic of unsloth should be renewed.