[BUG] - llama-cpp will not load local models

Open ajweber opened this issue 1 year ago • 1 comments

Description

Have tried a number of huggingface models and consistently get the error message: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291

This appears to be an old bug that was fixed months ago in llama-cpp. Is it possible your run_linux script is installing an older version of llama-cpp (and/or its python server)?

Reproduction steps

Change env var for LOCAL_MODEL to a downloaded llama-3.1-8B...gguf model from huggingface.
execute run_linux.sh

Screenshots

No response

Logs

No response

Browsers

Other

OS

Linux

Additional information

No response

Sep 03 '24 20:09 ajweber

Still can not test due to other bug logged (startup issue)...HOWEVER, if I change the script to d/l and install llama_cpp_python v0.2.90, it appears to load the local model and that starts correctly.

(End of output is INFO: Started server process[0000] INFO Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:31415 (Press CTRL+C to quit)

Sep 05 '24 18:09 ajweber