text-generation-webui llama.cpp: loading model from models\Chinese-LLaMA-7B\ggml-model-q4_0.bin error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file? llama_init_from

Describe the bug

llama.cpp: loading model from models\Chinese-LLaMA-7B\ggml-model-q4_0.bin error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file? llama_init_from_file: failed to load model

Traceback (most recent call last): File “G:\Soft\text-generation-webui[server.py](http://server.py/)”, line 67, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “G:\Soft\text-generation-webui\modules[models.py](http://models.py/)”, line 142, in load_model model, tokenizer = LlamaCppModel.from_pretrained(model_file) File “G:\Soft\text-generation-webui\modules\llamacpp_model.py”, line 32, in from_pretrained self.model = Llama(**params) File “G:\Soft\text-generation-webui\python310\lib\site-packages\llama_cpp[llama.py](http://llama.py/)”, line 148, in init assert self.ctx is not None AssertionError

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Cannot be used after updating the version

Screenshot

No response

Logs

llama.cpp: loading model from models\Chinese-LLaMA-7B\ggml-model-q4_0.bin
error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file?
llama_init_from_file: failed to load model

Traceback (most recent call last):
File “G:\Soft\text-generation-webui\server.py”, line 67, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “G:\Soft\text-generation-webui\modules\models.py”, line 142, in load_model
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
File “G:\Soft\text-generation-webui\modules\llamacpp_model.py”, line 32, in from_pretrained
self.model = Llama(**params)
File “G:\Soft\text-generation-webui\python310\lib\site-packages\llama_cpp\llama.py”, line 148, in init
assert self.ctx is not None
AssertionError

System Info

win11

May 12 '23 13:05 wzgrx

i got the same error today, i installed on fresh computer and got the same error message

May 12 '23 21:05 NightFuryPrime

Has text-generation-webui incorporated the recent changes of the GGML file format from llama.cpp? Maybe that’s the issue.

May 13 '23 04:05 acertone

okay guys, they updated the model, just download the older one https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main ggml-vic13b-uncensored-q5_1.bin from here, its working fine for now, just do not run the update afterwards

EDIT: place it here C:\TCHT\oobabooga_windows\text-generation-webui\models\eachadea_ggml-vicuna-13b-1-1

May 13 '23 08:05 NightFuryPrime

For reference:

https://github.com/hwchase17/langchain/issues/2592#issuecomment-1502065790

The error is caused by the ggml model you're attempting to use not being compatible with the version of llama.cpp being used by the web-ui. If you're using TheBloke ggml models:

REQUIRES LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)! llama.cpp recently made a breaking change to its quantisation methods.

I have re-quantised the GGML files in this repo. Therefore you will require llama.cpp compiled on May 12th or later (commit b9fd7ee or later) to use them.

The previous files, which will still work in older versions of llama.cpp, can be found in branch previous_llama.

I'm not sure if the old models will work with the new llama.cpp mentioned above.

May 13 '23 09:05 m-spangenberg

using the oneclick install how to update llama.cpp ? on windows

May 13 '23 11:05 maxime-fleury

https://github.com/ggerganov/llama.cpp/pull/1405 It seems that there is a problem with the new model format updated by ccp

May 13 '23 13:05 wzgrx

https://github.com/ggerganov/llama.cpp/issues/1408

May 13 '23 13:05 wzgrx

It seems that there is a problem with the new model format updated by CCP, and the quantified version of the model format has undergone significant changes. This is also why the new version cannot recognize the issue of quantifying q4

May 13 '23 13:05 wzgrx

I'm not sure if the old models will work with the new llama.cpp mentioned above.

Unfortunately they won't.

Models quantised before llama.cpp commit b9fd7ee will only work with llama.cpp from before that commit.
And models quantised after llama.cpp commit b9fd7ee will only work with llama.cpp from after that commit.

All my GGML repos except the latest two have the previous_llama branch for users who can't upgrade llama.cpp yet, or are using UIs like text-generation-webui which haven't updated yet.

Keep an eye on this issue in the pyllama-cpp repo. When pyllama-cpp updates to the latest code, it will be possible to do a manual upgrade of text-generation-webui. And I'm sure oobabooga will put out an official update soon.

May 13 '23 13:05 TheBloke

A new llama-cpp-python version (0.1.50) is now available (installable with pip install llama-cpp-python==0.1.50).

May 14 '23 09:05 feeelX

For those using the one click installer (Windows): Edit the following line in text-generation-webui/requirements.txt

Old

llama-cpp-python==0.1.45; platform_system != "Windows"
https://github.com/abetlen/llama-cpp-python/releases/download/v0.1.45/llama_cpp_python-0.1.45-cp310-cp310-win_amd64.whl; platform_system == "Windows"

New

llama-cpp-python==0.1.50; platform_system != "Windows"
https://github.com/abetlen/llama-cpp-python/releases/download/v0.1.50/llama_cpp_python-0.1.50-cp310-cp310-win_amd64.whl; platform_system == "Windows"

Disclaimer: I'm an electrician, not a computer scientist, and I don't know what I'm doing, but it worked for me.

May 14 '23 10:05 Crimsonfart

对于那些使用一键式安装程序（Windows）的用户：在文本生成webui/requirements中编辑以下行.txt

老
llama-cpp-python==0.1.45; platform_system != "Windows"
https://github.com/abetlen/llama-cpp-python/releases/download/v0.1.45/llama_cpp_python-0.1.45-cp310-cp310-win_amd64.whl; platform_system == "Windows"
新增功能
llama-cpp-python==0.1.50; platform_system != "Windows"
https://github.com/abetlen/llama-cpp-python/releases/download/v0.1.50/llama_cpp_python-0.1.50-cp310-cp310-win_amd64.whl; platform_system == "Windows"
免责声明：我是一名电工，不是计算机科学家，我不知道我在做什么，但它对我有用。

Very effective, thank you to the developer for their repair

May 14 '23 10:05 wzgrx

Will there need to be changes to the UI to access the new GPU acceleration of llama.cpp?

May 14 '23 10:05 flurb18

是否需要更改 UI 才能访问 llama.cpp 的新 GPU 加速？

no need

May 14 '23 10:05 wzgrx

Will there need to be changes to the UI to access the new GPU acceleration of llama.cpp?

I think this is only in the test phase and not yet fully integrated. llama.cpp 1.50 doesn't use the GPU, at least not for me.

May 14 '23 11:05 Crimsonfart

text-generation-webui text-generation-webui copied to clipboard

llama.cpp: loading model from models\Chinese-LLaMA-7B\ggml-model-q4_0.bin error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file? llama_init_from_file: failed to load model

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard