llama-cpp-python ValueError: Failed to create llama

(yuna2) (base) adm@Adms-MacBook-Pro yuna-ai % python index.py ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:3:10: fatal error: 'ggml-common.h' file not found #include "ggml-common.h" ^~~~~~~~~~~~~~~ " UserInfo={NSLocalizedDescription=program_source:3:10: fatal error: 'ggml-common.h' file not found #include "ggml-common.h" ^~~~~~~~~~~~~~~ } llama_new_context_with_model: failed to initialize Metal backend Traceback (most recent call last): File "/Users/adm/Desktop/yuna-ai/index.py", line 171, in yuna_server = YunaServer() ^^^^^^^^^^^^ File "/Users/adm/Desktop/yuna-ai/index.py", line 33, in init self.chat_generator = ChatGenerator(self.config) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/adm/Desktop/yuna-ai/lib/generate.py", line 11, in init self.model = Llama( ^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_cpp/llama.py", line 328, in init self._ctx = _LlamaContext( ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_cpp/_internals.py", line 265, in init raise ValueError("Failed to create llama_context") ValueError: Failed to create llama_context Screen Shot 2024-03-25 at 23 43 18

Mar 25 '24 20:03 Isaakkamau

Maybe you need to reinstall llama-cpp-python with the following command:

CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir

Answer from: https://github.com/abetlen/llama-cpp-python/issues/1285#issuecomment-2007778703

Mar 26 '24 15:03 JackyCCK2126

Hey, @JackyCCK2126! I'm having the same issue, and now it works! Thanks, but what was the problem with?

Also, is there any workaround to speed up the generation on the M1?

Mar 28 '24 02:03 yukiarimo

@yukiarimo I don't know much about M1. But in general, you can offload more layers in GPU and lower the context size when initializing the LLama class by setting n_gpt_layers and n_ctx. (top_p and top_k may also affect a bit of speed) If it is still too slow, you can choose a smaller model.

However, if your prompt is not too long, it should have around 7 to 12 tokens per second, which is somehow acceptable for me.

Mar 28 '24 05:03 JackyCCK2126

@yukiarimo If you found a speed-up solution, please let me know. XD

Mar 28 '24 05:03 JackyCCK2126

Maybe you need to reinstall llama-cpp-python with the following command:

CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir

Answer from: #1285 (comment)

Doesn't seem to solve it for me... Do you happen to know if I'm missing something?

That's the end of the traceback:

Apr 17 '24 21:04 Spider-netizen

Maybe you need to reinstall llama-cpp-python with the following command:

CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir

Answer from: #1285 (comment)

Doesn't seem to solve it for me... Do you happen to know if I'm missing something?

That's the end of the traceback:

I have the same result too. It failed to create llama_context. I was wondering to why need to set DLLAMA_METAL=on? I think METAL is for Macbook but I was running llama.cpp on Windows PC.

Jun 02 '24 09:06 ScofieldYeh

Maybe you need to reinstall llama-cpp-python with the following command:

CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir

Answer from: #1285 (comment)

Doesn't seem to solve it for me... Do you happen to know if I'm missing something? That's the end of the traceback:

I have the same result too. It failed to create llama_context. I was wondering to why need to set DLLAMA_METAL=on? I think METAL is for Macbook but I was running llama.cpp on Windows PC.

Yes. Metal is only for Apple's products.

Jun 02 '24 10:06 JackyCCK2126

llama-cpp-python llama-cpp-python copied to clipboard

ValueError: Failed to create llama_context

llama-cpp-python
llama-cpp-python copied to clipboard