llama-cpp-python
llama-cpp-python copied to clipboard
ValueError: Failed to create llama_context
(yuna2) (base) adm@Adms-MacBook-Pro yuna-ai % python index.py
ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:3:10: fatal error: 'ggml-common.h' file not found
#include "ggml-common.h"
^~~~~~~~~~~~~~~
" UserInfo={NSLocalizedDescription=program_source:3:10: fatal error: 'ggml-common.h' file not found
#include "ggml-common.h"
^~~~~~~~~~~~~~~
}
llama_new_context_with_model: failed to initialize Metal backend
Traceback (most recent call last):
File "/Users/adm/Desktop/yuna-ai/index.py", line 171, in
Maybe you need to reinstall llama-cpp-python with the following command:
CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir
Answer from: https://github.com/abetlen/llama-cpp-python/issues/1285#issuecomment-2007778703
Hey, @JackyCCK2126! I'm having the same issue, and now it works! Thanks, but what was the problem with?
Also, is there any workaround to speed up the generation on the M1?
@yukiarimo I don't know much about M1. But in general, you can offload more layers in GPU and lower the context size when initializing the LLama class by setting n_gpt_layers
and n_ctx
. (top_p
and top_k
may also affect a bit of speed) If it is still too slow, you can choose a smaller model.
However, if your prompt is not too long, it should have around 7 to 12 tokens per second, which is somehow acceptable for me.
@yukiarimo If you found a speed-up solution, please let me know. XD
Maybe you need to reinstall llama-cpp-python with the following command:
CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir
Answer from: #1285 (comment)
Doesn't seem to solve it for me... Do you happen to know if I'm missing something?
That's the end of the traceback:
Maybe you need to reinstall llama-cpp-python with the following command:
CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir
Answer from: #1285 (comment)
Doesn't seem to solve it for me... Do you happen to know if I'm missing something?
That's the end of the traceback:
![]()
I have the same result too. It failed to create llama_context. I was wondering to why need to set DLLAMA_METAL=on? I think METAL is for Macbook but I was running llama.cpp on Windows PC.
Maybe you need to reinstall llama-cpp-python with the following command:
CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir
Answer from: #1285 (comment)
Doesn't seem to solve it for me... Do you happen to know if I'm missing something? That's the end of the traceback:
I have the same result too. It failed to create llama_context. I was wondering to why need to set DLLAMA_METAL=on? I think METAL is for Macbook but I was running llama.cpp on Windows PC.
Yes. Metal is only for Apple's products.