text-generation-webui
text-generation-webui copied to clipboard
llama.cpp: loading model from models\Chinese-LLaMA-7B\ggml-model-q4_0.bin error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file? llama_init_from_file: failed to load model
Describe the bug
llama.cpp: loading model from models\Chinese-LLaMA-7B\ggml-model-q4_0.bin error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file? llama_init_from_file: failed to load model
Traceback (most recent call last): File “G:\Soft\text-generation-webui[server.py](http://server.py/)”, line 67, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “G:\Soft\text-generation-webui\modules[models.py](http://models.py/)”, line 142, in load_model model, tokenizer = LlamaCppModel.from_pretrained(model_file) File “G:\Soft\text-generation-webui\modules\llamacpp_model.py”, line 32, in from_pretrained self.model = Llama(**params) File “G:\Soft\text-generation-webui\python310\lib\site-packages\llama_cpp[llama.py](http://llama.py/)”, line 148, in init assert self.ctx is not None AssertionError
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Cannot be used after updating the version
Screenshot
No response
Logs
llama.cpp: loading model from models\Chinese-LLaMA-7B\ggml-model-q4_0.bin
error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file?
llama_init_from_file: failed to load model
Traceback (most recent call last):
File “G:\Soft\text-generation-webui\server.py”, line 67, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “G:\Soft\text-generation-webui\modules\models.py”, line 142, in load_model
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
File “G:\Soft\text-generation-webui\modules\llamacpp_model.py”, line 32, in from_pretrained
self.model = Llama(**params)
File “G:\Soft\text-generation-webui\python310\lib\site-packages\llama_cpp\llama.py”, line 148, in init
assert self.ctx is not None
AssertionError
System Info
win11
i got the same error today, i installed on fresh computer and got the same error message
Has text-generation-webui incorporated the recent changes of the GGML file format from llama.cpp? Maybe that’s the issue.
okay guys, they updated the model, just download the older one https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/tree/main ggml-vic13b-uncensored-q5_1.bin from here, its working fine for now, just do not run the update afterwards
EDIT: place it here C:\TCHT\oobabooga_windows\text-generation-webui\models\eachadea_ggml-vicuna-13b-1-1
For reference:
https://github.com/hwchase17/langchain/issues/2592#issuecomment-1502065790
The error is caused by the ggml model you're attempting to use not being compatible with the version of llama.cpp being used by the web-ui. If you're using TheBloke ggml models:
REQUIRES LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)! llama.cpp recently made a breaking change to its quantisation methods.
I have re-quantised the GGML files in this repo. Therefore you will require llama.cpp compiled on May 12th or later (commit b9fd7ee or later) to use them.
The previous files, which will still work in older versions of llama.cpp, can be found in branch previous_llama.
I'm not sure if the old models will work with the new llama.cpp mentioned above.
using the oneclick install how to update llama.cpp ? on windows
https://github.com/ggerganov/llama.cpp/pull/1405 It seems that there is a problem with the new model format updated by ccp
https://github.com/ggerganov/llama.cpp/issues/1408
It seems that there is a problem with the new model format updated by CCP, and the quantified version of the model format has undergone significant changes. This is also why the new version cannot recognize the issue of quantifying q4
I'm not sure if the old models will work with the new llama.cpp mentioned above.
Unfortunately they won't.
- Models quantised before llama.cpp commit b9fd7ee will only work with llama.cpp from before that commit.
- And models quantised after llama.cpp commit b9fd7ee will only work with llama.cpp from after that commit.
All my GGML repos except the latest two have the previous_llama
branch for users who can't upgrade llama.cpp yet, or are using UIs like text-generation-webui which haven't updated yet.
Keep an eye on this issue in the pyllama-cpp repo. When pyllama-cpp updates to the latest code, it will be possible to do a manual upgrade of text-generation-webui. And I'm sure oobabooga will put out an official update soon.
A new llama-cpp-python
version (0.1.50
) is now available (installable with pip install llama-cpp-python==0.1.50
).
For those using the one click installer (Windows): Edit the following line in text-generation-webui/requirements.txt
Old
llama-cpp-python==0.1.45; platform_system != "Windows"
https://github.com/abetlen/llama-cpp-python/releases/download/v0.1.45/llama_cpp_python-0.1.45-cp310-cp310-win_amd64.whl; platform_system == "Windows"
New
llama-cpp-python==0.1.50; platform_system != "Windows"
https://github.com/abetlen/llama-cpp-python/releases/download/v0.1.50/llama_cpp_python-0.1.50-cp310-cp310-win_amd64.whl; platform_system == "Windows"
Disclaimer: I'm an electrician, not a computer scientist, and I don't know what I'm doing, but it worked for me.
对于那些使用一键式安装程序(Windows)的用户:在文本生成webui/requirements中编辑以下行.txt
老
llama-cpp-python==0.1.45; platform_system != "Windows" https://github.com/abetlen/llama-cpp-python/releases/download/v0.1.45/llama_cpp_python-0.1.45-cp310-cp310-win_amd64.whl; platform_system == "Windows"
新增功能
llama-cpp-python==0.1.50; platform_system != "Windows" https://github.com/abetlen/llama-cpp-python/releases/download/v0.1.50/llama_cpp_python-0.1.50-cp310-cp310-win_amd64.whl; platform_system == "Windows"
免责声明:我是一名电工,不是计算机科学家,我不知道我在做什么,但它对我有用。
Very effective, thank you to the developer for their repair
Will there need to be changes to the UI to access the new GPU acceleration of llama.cpp?
是否需要更改 UI 才能访问 llama.cpp 的新 GPU 加速?
no need
Will there need to be changes to the UI to access the new GPU acceleration of llama.cpp?
I think this is only in the test phase and not yet fully integrated. llama.cpp 1.50 doesn't use the GPU, at least not for me.