Llama 4 not working
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4' llama_model_load_from_file_impl: failed to load model
Please update to a newer version of llama.cpp:
https://github.com/ggml-org/llama.cpp/releases/tag/b5074
My fork project has added some updates of llama4: https://github.com/JamePeng/llama-cpp-python
Same issue, how to run llama4?
@kerlion What version of llama-cpp-python are you using? Can you also give me some inside about your platform (OS, etc).
@kerlion What version of llama-cpp-python are you using? Can you also give me some inside about your platform (OS, etc).
image: nvidia/cuda:12.2.0-runtime-ubuntu22.04 llama_cpp_python 0.3.8
I compiled it from the source code, passed this error. But I do not know which "chat_format" to use? Llama-4-Scout-17B-16E-Instruct-UD-Q2_K_XL
@kerlion
Great job on compiling it from source. Below is the command that might save you from the struggle of source compiling.
CMAKE_ARGS="-DGGML_CUDA=ON -DLLAMA_LLAVA=OFF" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
Furthermore, i wasn't able to quite understand your message about using which "chat_format", can you please elaborate.
same error with llama_cpp_python 0.3.8:
print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 62.90 GiB (5.01 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'llama4' llama_model_load_from_file_impl: failed to load model
My fork project has added some updates of llama4: https://github.com/JamePeng/llama-cpp-python
Could you please provide your commit number ?