lollms-webui
lollms-webui copied to clipboard
ERROR - failed to load model from ./models/gpt4all-lora-quantized-ggml.bin
Current Behavior
The default model file (gpt4all-lora-quantized-ggml.bin) already exists. Do you want to replace it? Press B to download it with a browser (faster). [Y,N,B]?N
Skipping download of model file... Cleaning tmp folder Virtual environment created and packages installed successfully. Launching application... Checking discussions database... [2023-04-18 10:11:49,423] {model.py:73} INFO - Loading model ... llama_model_load: loading model from './models/gpt4all-lora-quantized-ggml.bin' - please wait ... llama_model_load: invalid model file './models/gpt4all-lora-quantized-ggml.bin' (bad magic) [2023-04-18 10:11:49,424] {model.py:75} ERROR - failed to load model from ./models/gpt4all-lora-quantized-ggml.bin
Steps to Reproduce
run webui.bat
Screenshots
What are your system specs? OS? CPU? RAM? Did you download model correctly? I mean maybe its corrupted, try downloading using browser, and then copy it to /models/ folder.
Win11, 9900K, 32G.
The file was downloaded completely without any issues. I tried downloading it again using the browser, but the file size and error message were the same.
CPU seems to support AVX2..
Im running this in ubuntu VM, seems to load just fine.
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Checking discussions database...
llama_model_load: loading model from './models/gpt4all-lora-quantized-ggml.bin' - please wait ...
llama_model_load: n_vocab = 32001
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 5809.78 MB (+ 2052.00 MB per state)
llama_model_load: loading tensors from './models/gpt4all-lora-quantized-ggml.bin'
llama_model_load: model size = 4017.27 MB / num tensors = 291
llama_init_from_file: kv self size = 512.00 MB
Chatbot created successfully
* Serving Flask app 'GPT4All-WebUI'
* Debug mode: off
[2023-04-18 09:28:50,988] {_internal.py:224} INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:9600
* Running on http://mm.jj.ss:9600
[2023-04-18 09:28:50,988] {_internal.py:224} INFO - Press CTRL+C to quit
Well i did a git pull right now on VM, and i cant load it anymore aswell
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Checking discussions database...
llama_model_load: loading model from './models/gpt4all-lora-quantized-ggml.bin' - please wait ...
./models/gpt4all-lora-quantized-ggml.bin: invalid model file (bad magic [got 0x6e756f46 want 0x67676a74])
you most likely need to regenerate your ggml files
the benefit is you'll get 10-100x faster load times
see https://github.com/ggerganov/llama.cpp/issues/91
use convert-pth-to-ggml.py to regenerate from original pth
use migrate-ggml-2023-03-30-pr613.py if you deleted originals
llama_init_from_file: failed to load model
Chatbot created successfully
* Serving Flask app 'GPT4All-WebUI'
* Debug mode: off
[2023-04-18 09:35:03,952] {_internal.py:224} INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:9600
* Running on http://mm.jj.ss:9600
[2023-04-18 09:35:03,952] {_internal.py:224} INFO - Press CTRL+C to quit
@ParisNeo something is borked again. Maybe pyllamacpp, ... my hate towards python keeps growing...
You got "bad magic" too.
You got "bad magic" too.
its most likely that the model loading script was updated because this UI relies and depends on other packages/repos, so if they get updated it will take some time for main dev of this repo to look through the code and fix it. He on vacation right now. So just hang tight. it will be fixed eventually.
@Datou Hi, try this model, it works, the original model is messed up, idk why.
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Checking discussions database...
llama_model_load: loading model from './models/ggml-vicuna-13b-4bit-rev1.bin' - please wait ...
llama_model_load: n_vocab = 32001
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 5120
llama_model_load: n_mult = 256
llama_model_load: n_head = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 13824
llama_model_load: n_parts = 2
llama_model_load: type = 2
llama_model_load: ggml map size = 7759.84 MB
llama_model_load: ggml ctx size = 101.25 KB
llama_model_load: mem required = 9807.93 MB (+ 3216.00 MB per state)
llama_model_load: loading tensors from './models/ggml-vicuna-13b-4bit-rev1.bin'
llama_model_load: model size = 7759.40 MB / num tensors = 363
llama_init_from_file: kv self size = 800.00 MB
Chatbot created successfully
* Serving Flask app 'GPT4All-WebUI'
* Debug mode: off
[2023-04-18 12:36:03,700] {_internal.py:224} INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:9600
* Running on http://mama-mia.juu:9600
[2023-04-18 12:36:03,700] {_internal.py:224} INFO - Press CTRL+C to quit
For me it loaded after i pulled the newest changes from git and redownloaded gpt4all-lora-quantized-ggml.bin
model
Sorry guyes I have a very slow connection these days and I lost the connection yesterday. It should work now. Please if the problem is solved make sure to close the issue.