unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist

Open okoliechykwuka opened this issue 1 year ago • 33 comments

The below error occured while trying to convert model to gguf format.

I noticed that quantized folder resides in llama.cpp/examples/quantize

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?

# Save to q4_k_m GGUF
if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

okoliechykwuka avatar Jul 09 '24 14:07 okoliechykwuka

Weird I just tried it in the last hour and it works

danielhanchen avatar Jul 10 '24 09:07 danielhanchen

It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth https://github.com/ggerganov/llama.cpp/issues/8107 if you run make command in llama.cpp folder it will work

scherbakovdmitri avatar Jul 10 '24 16:07 scherbakovdmitri

Weird it stopped working? Hmm I shall try this in Colab and report back!

danielhanchen avatar Jul 12 '24 06:07 danielhanchen

I have the same problem. Is there a solution now?

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist. But we expect this file to exist! Maybe the llama.cpp developers changed the name?

Zhangy-ly avatar Jul 19 '24 05:07 Zhangy-ly

It should function - are you using Colab?

danielhanchen avatar Jul 19 '24 06:07 danielhanchen

It should function - are you using Colab?

Well, mine is as follows: NVIDIA V100 Driver Version: 535.146.02 CUDA Version: 12.1

I temporarily solved this problem by rolling back llama.cpp

cd llama.cpp git checkout b3345 git submodule update --init --recursive make clean make all -j git log -1

Zhangy-ly avatar Jul 19 '24 06:07 Zhangy-ly

@danielhanchen Yes, I am using colab, but I am still having the same error.

okoliechykwuka avatar Jul 19 '24 16:07 okoliechykwuka

Wait weird I just ran it with no errors in Colab - it's best to use our updated notebooks on our Github and start a fresh

danielhanchen avatar Jul 20 '24 20:07 danielhanchen

@Zhangy-ly That is an effective workaround.

cd llama.cpp git checkout b3345 git submodule update --init --recursive make clean make all -j git log -1

Deluxer avatar Jul 28 '24 22:07 Deluxer

@Zhangy-ly That is an effective workaround.

cd llama.cpp git checkout b3345 git submodule update --init --recursive make clean make all -j git log -1

To anyone having error while using those bash commands, use: ! before each command

theodufort avatar Aug 01 '24 13:08 theodufort

Wait so the issue persists? Are people using Colab / Runpod?

danielhanchen avatar Aug 02 '24 06:08 danielhanchen

Wait so the issue persists? Are people using Colab / Runpod?

Hi Daniel,

Thank you for your response.

To clarify, the issue persists on my Ubuntu setup, although it seems to run without problems on Colab. Is there any other information you need to help diagnose the issue? plz tell me.

Ubuntu, NVIDIA V100, Driver Version: 535.146.02, CUDA Version: 12.1

packages in environment:

Name Version

_libgcc_mutex 0.1
_openmp_mutex 5.1
accelerate 0.32.1
aiohttp 3.9.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.2.0
bitsandbytes 0.43.2
blas 1.0
brotli-python 1.0.9
bzip2 1.0.8
ca-certificates 2024.3.11
certifi 2024.7.4
charset-normalizer 2.0.4
cuda-cudart 12.1.105
cuda-cupti 12.1.105
cuda-libraries 12.1.0
cuda-nvrtc 12.1.105
cuda-nvtx 12.1.105
cuda-opencl 12.5.39
cuda-runtime 12.1.0
cuda-version 12.5
datasets 2.20.0
dill 0.3.8
docstring-parser 0.16
ffmpeg 4.3
filelock 3.13.1
freetype 2.12.1
frozenlist 1.4.1
fsspec 2024.2.0
gguf 0.9.1
gmp 6.2.1
gmpy2 2.1.2
gnutls 3.6.15
huggingface-hub 0.23.4
idna 3.7
intel-openmp 2023.1.0
jinja2 3.1.3
jpeg 9e
lame 3.100
lcms2 2.12
ld_impl_linux-64 2.38
lerc 3.0
libcublas 12.1.0.26
libcufft 11.0.2.4
libcufile 1.10.1.7
libcurand 10.3.6.82
libcusolver 11.4.4.55
libcusparse 12.0.2.55
libdeflate 1.17
libffi 3.4.4
libgcc-ng 11.2.0
libgomp 11.2.0
libiconv 1.16
libidn2 2.3.4
libjpeg-turbo 2.0.0
libnpp 12.0.2.50
libnvjitlink 12.1.105
libnvjpeg 12.1.1.14
libpng 1.6.39
libstdcxx-ng 11.2.0
libtasn1 4.19.0
libtiff 4.5.1
libunistring 0.9.10
libuuid 1.41.5
libwebp-base 1.3.2
llvm-openmp 14.0.6
lz4-c 1.9.4
markdown-it-py 3.0.0
markupsafe 2.1.5
mdurl 0.1.2
mkl 2023.1.0
mkl-service 2.4.0
mkl_fft 1.3.8
mkl_random 1.2.4
mpc 1.1.0
mpfr 4.0.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
ncurses 6.4
nettle 3.7.3
networkx 3.2.1
numpy 1.26.4
numpy-base 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.1.105
nvidia-nvtx-cu12 12.1.105
openh264 2.1.1
openjpeg 2.4.0
openssl 3.0.14
packaging 24.1
pandas 2.2.2
peft 0.11.1
pillow 10.3.0
pip 24.0
protobuf 3.20.3
psutil 6.0.0
pyarrow 16.1.0
pyarrow-hotfix 0.6
pygments 2.18.0
pysocks 1.7.1
python 3.10.13
python-dateutil 2.9.0.post0
pytorch-cuda 12.1
pytorch-mutex 1.0
pytz 2024.1
pyyaml 6.0.1
readline 8.2
regex 2024.5.15
requests 2.32.2
rich 13.7.1
safetensors 0.4.3
sentencepiece 0.2.0
setuptools 69.5.1
shtab 1.7.1
six 1.16.0
sqlite 3.45.3
sympy 1.12
tbb 2021.8.0
tk 8.6.14
tokenizers 0.19.1
torch 2.2.0+cu121
torchaudio 2.2.0
torchvision 0.17.0
tqdm 4.66.4
transformers 4.43.1
triton 2.2.0
trl 0.8.6
typing-extensions 4.9.0
tyro 0.8.5
tzdata 2024.1
unsloth 2024.7
urllib3 2.2.2
wheel 0.43.0
xformers 0.0.24
xxhash 3.4.1
xz 5.4.6
yaml 0.2.5
yarl 1.9.4
zlib 1.2.13
zstd 1.5.5

Zhangy-ly avatar Aug 02 '24 06:08 Zhangy-ly

@Zhangy-ly That is an effective workaround.

cd llama.cpp git checkout b3345 git submodule update --init --recursive make clean make all -j git log -1

solved my situation. There are no llama-quantize and quantize file in the newest git source(08/07/2024). So, unslothai should install the specific version of llama.cpp to fix this issue. Thank you! ;)

jeehunseo avatar Aug 07 '24 08:08 jeehunseo

It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth ggerganov/llama.cpp#8107 if you run make command in llama.cpp folder it will work

Same problem here. This tip solved the issue.

$ cd llama.cpp
make

thyarles avatar Aug 13 '24 19:08 thyarles

manually make works. it generates llama.cpp/llama-quantize

$ cd llama.cpp
make

It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth ggerganov/llama.cpp#8107 if you run make command in llama.cpp folder it will work

Same problem here. This tip solved the issue.

$ cd llama.cpp
make

yuxiaojian avatar Aug 20 '24 11:08 yuxiaojian

Hmm I might have to re-take a look why it's not working - maybe my calling mechanisms aren't functioning correctly

danielhanchen avatar Aug 24 '24 00:08 danielhanchen

On windows I need to remove the extension llama-quantize.exe

and then

%}{{'<|im_start|>user
' + message['content'] + '<|im_end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|im_start|>assistant
' + message['content'] + '<|im_end|>
' }}{% else %}{{ '<|im_start|>system
' + message['content'] + '<|im_end|>
' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}{% else %}{% for message in messages %}{% if message['from'] == 'human' %}{{'<|im_start|>user
' + message['value'] + '<|im_end|>
'}}{% elif message['from'] == 'gpt' %}{{'<|im_start|>assistant
' + message['value'] + '<|im_end|>
' }}{% else %}{{ '<|im_start|>system
' + message['value'] + '<|im_end|>
' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}{% endif %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:model\unsloth.BF16.gguf: n_tensors = 339, total_size = 15.2G
Writing: 100%|██████████| 15.2G/15.2G [01:00<00:00, 250Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to model\unsloth.BF16.gguf
Unsloth: Conversion completed! Output location: ./model/unsloth.BF16.gguf
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This will take 20 minutes...
'.' is not recognized as an internal or external command,
operable program or batch file.
RuntimeError: Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.

whisper-bye avatar Aug 29 '24 15:08 whisper-bye

A bit of a noob here, but I have a workaround. I had built llama.cpp with VS2022 using cmake. I had a llama.cpp\bin\Releases with the resulting dll and exe files, which unsloth couldn't find. Simply copying that whole folder to llama.cpp\llama-quantize worked. I was initially confused as to what exactly unsloth was looking for.

throttlekitty avatar Sep 09 '24 03:09 throttlekitty

Sorry on the issues on llama.cpp :( I might actually make a section with exact details on how to do llama.cpp properly

danielhanchen avatar Sep 10 '24 08:09 danielhanchen

Same for me

File ~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:975, in save_to_gguf(model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, _run_installer)
    [973](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:973)     quantize_location = "llama.cpp/llama-quantize"
    [974](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:974) else:
--> [975](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:975)     raise RuntimeError(
    [976](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:976)         "Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.\n"\
    [977](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:977)         "But we expect this file to exist! Maybe the llama.cpp developers changed the name?"
    [978](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:978)     )
...
    [981](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:981) # See https://github.com/unslothai/unsloth/pull/730
    [982](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:982) # Filenames changed again!

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?

Antonytm avatar Sep 21 '24 20:09 Antonytm

@Antonytm Would https://github.com/unslothai/unsloth/wiki#manually-saving-to-gguf be helpful? Sorry on the delay!

danielhanchen avatar Oct 01 '24 08:10 danielhanchen

@danielhanchen yes! It works. 👍

rorubyy avatar Oct 09 '24 01:10 rorubyy

I tried building the same with cmake but exe's and dll's are not getting generated. I have manually copied the dll's and exe's from the release builds but I get the same issue. I then converted the model to gguf manually

python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16

mo del file gets generated but on creating this model with ollama from the gguf file I get the following error

`C:\Users\Desktop\New folder>ollama create unsloth_m -f "C:\Users\Desktop\New folder\op.gguf"

Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"`

Please help.

jainpradeep avatar Dec 06 '24 12:12 jainpradeep

git log -1

save me work,thanks

lastrei avatar Dec 10 '24 14:12 lastrei

I tried building the same with cmake but exe's and dll's are not getting generated. I have manually copied the dll's and exe's from the release builds but I get the same issue. I then converted the model to gguf manually

python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16

mo del file gets generated but on creating this model with ollama from the gguf file I get the following error

`C:\Users\Desktop\New folder>ollama create unsloth_m -f "C:\Users\Desktop\New folder\op.gguf"

Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"`

Please help.

seems there is someting wrong with your modelfile , usually on the top is FROM model_name.gguf

lastrei avatar Dec 10 '24 14:12 lastrei

@jainpradeep Windows right? Also apologies on the delay - Modelfile should look like https://github.com/ollama/ollama/blob/main/docs/modelfile.md and Windows building for llama.cpp can be tough - see https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md

I was planning to add more stable support for Windows in the future

danielhanchen avatar Dec 12 '24 09:12 danielhanchen

The solution provided by @Zhangy-ly checking out llama.cpp branch doesn't seem to work anymore. I used cmake
as advised in the updated build documentation

git checkout b3345 git submodule update --init --recursive cmake -B build cmake --build build --config Release git log -1

CMake generates an out-of-source build by default, meaning the build artifacts (compiled binaries, etc.) are placed in a separate build folder (e.g., build/Release) instead of the source folder (llama.cpp). I copied all the binaries in the build folder to the root folder and re-ran the unsloth Llama_3_2_1B+3B_Conversational_+_2x_faster_finetuning collab. But still I get the same

RuntimeError( "Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.\n"\ "But we expect this file to exist! Maybe the llama.cpp developers changed the name?" )

As a alternate workaround I tried converting the model to gguf manually using python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16 but the generated outputfile dosent work with ollama.

`Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"``

Merged model files as suggested by @danielhanchen are in order and the config and safe tensor files are present in the folder and there are no errors while generating the merged model.

Can someone please suggest me how I can use the model in ollama without converting it to gguf. I have been trying this to work since 1 month. There were many issues related to corporate proxy, SSL issues, Timeout issues, issues due to dependency versions, issues for building llama.cpp (I tried make, cmake, ninja, vs2022 I have tried everything) but I am stuck on the final step for the model to work with ollama to use it in openweb-ui.

Please suggest what am I doing wrong?

jainpradeep avatar Dec 13 '24 06:12 jainpradeep

I got this issue on ubuntu, and the following steps worked for me.

  1. Build manually with cmake (It seems that make does not work anymore.), following the llama.cpp build instruction.
  2. The above creates executable files (llama-*) under llama.cpp/build/bin. Copy them over to directly under llama.cpp/.
  3. Re-run the failed unsloth call. (In my case, push_to_hub_gguf.)

hideaki avatar Dec 16 '24 14:12 hideaki

I got this issue on ubuntu, and the following steps worked for me.

  1. Build manually with cmake (It seems that make does not work anymore.), following the llama.cpp build instruction.
  2. The above creates executable files (llama-*) under llama.cpp/build/bin. Copy them over to directly under llama.cpp/.
  3. Re-run the failed unsloth call. (In my case, push_to_hub_gguf.)

It works for me.

In Collab env i used

!(cd llama.cpp; cmake -B build;cmake --build build --config Release)

Then i copied the executable to /content/llama.cpp directory with cp then i re-ran the celd.

hwpoison avatar Dec 25 '24 07:12 hwpoison

I got this issue on ubuntu, and the following steps worked for me.

  1. Build manually with cmake (It seems that make does not work anymore.), following the llama.cpp build instruction.
  2. The above creates executable files (llama-*) under llama.cpp/build/bin. Copy them over to directly under llama.cpp/.
  3. Re-run the failed unsloth call. (In my case, push_to_hub_gguf.)

It works for me.

In Collab env i used

!(cd llama.cpp; cmake -B build;cmake --build build --config Release)

Then i copied the executable to /content/llama.cpp directory with cp then i re-ran the celd.

Thanks. Same problem, following this instruction and solved. Seems that the llama.cpp compiling logic of unsloth should be renewed.

fumiama avatar Jan 27 '25 07:01 fumiama