private-gpt
private-gpt copied to clipboard
Illegal hardware instruction
Hello,
Thanks again for this nice project! After the ingest process, when I try to run python3 privateGPT.py I've an error:
llama.cpp: loading model from ./models/ggml-model-q4_0.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113748.20 KB
llama_model_load_internal: mem required = 5809.33 MB (+ 2052.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 512.00 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
Using embedded DuckDB with persistence: data will be stored in: db
[1] 18314 illegal hardware instruction python3 privateGPT.py
Hardware: MacBook Pro M1 Software: macOS Ventura
Regards, Hisxo
Same for me but M1 Pro.
% python3 privateGPT.py llama.cpp: loading model from ./models/ggml-model-q4_0.bin llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 4113748.20 KB llama_model_load_internal: mem required = 5809.33 MB (+ 2052.00 MB per state) ................................................................................................... . llama_init_from_file: kv self size = 512.00 MB AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | Using embedded DuckDB with persistence: data will be stored in: db zsh: illegal hardware instruction python3 privateGPT.py
I think you need AVX and/or F16C:
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
"No, M1 is not based on the x86 architecture so it can, in no way shape or form get AVX, because AVX is defined only for x86_64 architecture."
I think your CPU isn't supported.
Try this fork which uses qdrant instead of Chroma. I think Chroma relies on DuckDB which might be the issue. Using qdrant I dont get the using embedded duckdb message. Let me know.
Hello @alxspiker
Same error with CASALIOY :(
Try these steps: https://gist.github.com/cedrickchee/e8d4cb0c4b1df6cc47ce8b18457ebde0
I am getting the same 'illegal hardwarware instruction' error on my M1 Pro after running privateGPT.py Hardware: MacBook Pro M1 Software: macOS Monterey Python version: 3.10.11
@alxspiker The model provided in README of this repo is the same ggml quantized format as given in the link you have provided. In that case, it should ideally work on a M1, right?
@alxspiker -- This needs to be built from source on M1, right? https://github.com/su77ungr/CASALIOY