llama-cpp-python Llama 4 not working

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4' llama_model_load_from_file_impl: failed to load model

Please update to a newer version of llama.cpp:

https://github.com/ggml-org/llama.cpp/releases/tag/b5074

Apr 08 '25 14:04 Kenshiro-28

My fork project has added some updates of llama4: https://github.com/JamePeng/llama-cpp-python

Apr 08 '25 14:04 JamePeng

Same issue, how to run llama4?

Apr 14 '25 09:04 kerlion

@kerlion What version of llama-cpp-python are you using? Can you also give me some inside about your platform (OS, etc).

Apr 17 '25 07:04 AleefBilal

@kerlion What version of llama-cpp-python are you using? Can you also give me some inside about your platform (OS, etc).

image: nvidia/cuda:12.2.0-runtime-ubuntu22.04 llama_cpp_python 0.3.8

Apr 17 '25 08:04 kerlion

I compiled it from the source code, passed this error. But I do not know which "chat_format" to use? Llama-4-Scout-17B-16E-Instruct-UD-Q2_K_XL

Apr 17 '25 08:04 kerlion

@kerlion Great job on compiling it from source. Below is the command that might save you from the struggle of source compiling. CMAKE_ARGS="-DGGML_CUDA=ON -DLLAMA_LLAVA=OFF" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir Furthermore, i wasn't able to quite understand your message about using which "chat_format", can you please elaborate.

Apr 17 '25 09:04 AleefBilal

same error with llama_cpp_python 0.3.8:

print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 62.90 GiB (5.01 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4' llama_model_load_from_file_impl: failed to load model

Apr 20 '25 07:04 h-haghpanah

My fork project has added some updates of llama4: https://github.com/JamePeng/llama-cpp-python

Could you please provide your commit number ?

May 07 '25 18:05 perronemirko