private-gpt
private-gpt copied to clipboard
raise NameError(f"Could not load Llama model from path: {model_path}") NameError: Could not load Llama model from path: models/ggml-model-q4_0.bin
PS D:\privateGPT> python .\privateGPT.py llama.cpp: loading model from models/ggml-model-q4_0.bin llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 1000 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B error loading model: this format is no longer supported (see https://github.com/ggerganov/llama.cpp/pull/1305) llama_init_from_file: failed to load model Traceback (most recent call last): File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\embeddings\llamacpp.py", line 78, in validate_environment values["client"] = Llama( ^^^^^^ File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\llama.py", line 161, in init assert self.ctx is not None ^^^^^^^^^^^^^^^^^^^^ AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\privateGPT\privateGPT.py", line 57, in
Did you download and place the model in the models folder?
@BassAzayda Yup the models are in a folder called models.
And this is what's in my .env :
PERSIST_DIRECTORY=db LLAMA_EMBEDDINGS_MODEL=models/ggml-model-q4_0.bin MODEL_TYPE=GPT4All MODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin MODEL_N_CTX=1000
If using Windows could it be the slashes need to be the other way? Sorry Mac user here?
@BassAzayda I thought about that as a possibility, which is why I'm trying it also on WSL (Ubuntu).
@BassAzayda Unfortunately changing the slashes doesn't work :( Nor does providing the full path. It seems like the path isn't the problem here but rather an actual problem with loading the model due to the version of python and the library being used.
@imartinez Any solution to this?
You have to have the same python version, if it is a python version related problem. I think someone with a similar problem solved it by updating his to latest.
@d2rgaming-9000 i do have the latest Python 3.11.3
@b007zk have you tried passing absolute path instead of relative pat in the .env
?
That is what the readme asks for and what worked for me.
LLAMA_EMBEDDINGS_MODEL=/abspath/models/ggml-model-q4_0.bin
MODEL_PATH=/abspath/models/ggml-gpt4all-j-v1.3-groovy.bin
Where /abspath/
is the full, absolute path to that file.
And make sure you use =
, not :
(I run into your same issue when I had a typo there)
@smileBeda yes I have tried that, no luck unfortunately. I don't think the path is the problem here.
@b007zk I had the same exact issue in WSL. Please take a look at #198. I believe it addresses this issue and solves the problem. It introduces a file validator to the ingest.py
module using the pathlib PATH module. You can review the changes and verify if they effectively solve your issue.
@aHardReset Hey, thanks for your comment. I actually tried with your changes but unfortunately still running into the following issue when I run the privateGPT.py
Traceback (most recent call last):
File "D:\privateGPT\privateGPT.py", line 57, in
pip install llama-cpp-python==0.1.48 resolved my issue, along with pip install 'pygpt4all==v1.0.1' --force-reinstall
when using https://huggingface.co/mrgaang/aira/blob/main/gpt4all-converted.bin https://huggingface.co/Pi3141/alpaca-native-7B-ggml/blob/main/ggml-model-q4_0.bin
@kuppu thanks but that didn't work for me.
Try running Python 3.10.X use pyenv if available on windows so you can switch environments as well as try another embeddings model from huggingface?
cpp is pretty f*'d up so is it possible to just use the koboldcpp.exe file as a server for this? As far as I can tell, there is no reason to have to navigate any dependancies besides langchain and maybe web browsing for this project. IMO (and I am very stupid and unemployed) best practice would be to have the model be hosted by an extension with options as :
GPT API
llama.cpp
localhost
remotehost
and koboldcpp.exe
and then have
langchain
urllib3
tabulate
tqdm
or whatever as core dependencies. Pytorch is also often an important dependency for llama models to run above 10 t/s, but different GPUs have different CUDA requirements. eg, tesla k80/p40/H100 or GTX660/RTX4090 not to mention AMD