llama.cpp
llama.cpp copied to clipboard
Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
Name and Version
llama-server.exe --version ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | matrix cores: none version: 4880 (2048b591) built with MSVC 19.43.34808.0 for x64
Operating systems
Windows
GGML backends
Vulkan
Hardware
Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | matrix cores: none
Models
gemma3:12b
Problem description & steps to reproduce
it's unable to load Gemma3:12b GUFF model.
First Bad Commit
No response
Relevant log output
llama-server.exe -m %file_path_gemma3_12b% --no-mmap -c 16384 -np 1 -ngl 50 --temp 0.1 -t 9 -tb 8 -C FF000 --no-perf --host 0.0.0.0 --port 3000
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | matrix cores: none
Not enough set bits in CPU mask (8) to satisfy requested thread count: 9
Not enough set bits in CPU mask (8) to satisfy requested thread count: 9
build: 4880 (2048b591) with MSVC 19.43.34808.0 for x64
system info: n_threads = 9, n_threads_batch = 8, total_threads = 20
system_info: n_threads = 9 (n_threads_batch = 8) / 20 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
main: HTTP server is listening, hostname: 0.0.0.0, port: 3000, http threads: 19
main: loading model
srv load_model: loading model 'D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3'
llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Iris(R) Xe Graphics) - 16224 MiB free
llama_model_loader: loaded meta data with 35 key-value pairs and 1065 tensors from D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: gemma3.attention.head_count u32 = 16
llama_model_loader: - kv 1: gemma3.attention.head_count_kv u32 = 8
llama_model_loader: - kv 2: gemma3.attention.key_length u32 = 256
llama_model_loader: - kv 3: gemma3.attention.sliding_window u32 = 1024
llama_model_loader: - kv 4: gemma3.attention.value_length u32 = 256
llama_model_loader: - kv 5: gemma3.block_count u32 = 48
llama_model_loader: - kv 6: gemma3.context_length u32 = 8192
llama_model_loader: - kv 7: gemma3.embedding_length u32 = 3840
llama_model_loader: - kv 8: gemma3.feed_forward_length u32 = 15360
llama_model_loader: - kv 9: gemma3.vision.attention.head_count u32 = 16
llama_model_loader: - kv 10: gemma3.vision.attention.layer_norm_epsilon f32 = 0.000001
llama_model_loader: - kv 11: gemma3.vision.block_count u32 = 27
llama_model_loader: - kv 12: gemma3.vision.embedding_length u32 = 1152
llama_model_loader: - kv 13: gemma3.vision.feed_forward_length u32 = 4304
llama_model_loader: - kv 14: gemma3.vision.image_size u32 = 896
llama_model_loader: - kv 15: gemma3.vision.num_channels u32 = 3
llama_model_loader: - kv 16: gemma3.vision.patch_size u32 = 14
llama_model_loader: - kv 17: general.architecture str = gemma3
llama_model_loader: - kv 18: tokenizer.chat_template str = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 21: tokenizer.ggml.add_padding_token bool = false
llama_model_loader: - kv 22: tokenizer.ggml.add_unknown_token bool = false
llama_model_loader: - kv 23: tokenizer.ggml.bos_token_id u32 = 2
llama_model_loader: - kv 24: tokenizer.ggml.eos_token_id u32 = 1
llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,514906] = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
llama_model_loader: - kv 26: tokenizer.ggml.model str = llama
llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 28: tokenizer.ggml.pre str = default
llama_model_loader: - kv 29: tokenizer.ggml.scores arr[f32,262145] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,262145] = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 31: tokenizer.ggml.tokens arr[str,262145] = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv 32: tokenizer.ggml.unknown_token_id u32 = 3
llama_model_loader: - kv 33: general.quantization_version u32 = 2
llama_model_loader: - kv 34: general.file_type u32 = 15
llama_model_loader: - type f32: 563 tensors
llama_model_loader: - type f16: 165 tensors
llama_model_loader: - type q4_K: 290 tensors
llama_model_loader: - type q6_K: 47 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 7.57 GiB (5.34 BPW)
llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3'
srv load_model: failed to load model, 'D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3'
srv operator (): operator (): cleaning up before exit...
main: exiting due to model loading error
well, I think that would be an specific issue from model file from Ollama , I just tried another one from Modscope, it seems to works like charming.
The same issue occurs here when using Ollama gemma3:4b.
modelscope download --local_dir $PWD lmstudio-community/gemma-3-4b-it-GGUF --include *Q4_K_M*
./llama-cli -m /data/.yeahdongcn/models/gemma-3-4b-it-Q4_K_M.gguf -ngl 999
This works for me.
I am running similar errors with gemma3:27b + llama-server
llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'gemma3_27b.gguf'
srv load_model: failed to load model, 'gemma3_27b.gguf'
srv operator(): operator(): cleaning up before exit...
terminate called without an active exception
main: exiting due to model loading error
Aborted (core dumped)
I have the same issue with Gemma3 27B
Hello
El jue, 20 de mar de 2025, 12:37 p. m., dacavali @.***> escribió:
I have the same issue with Gemma3 27B
— Reply to this email directly, view it on GitHub https://github.com/ggml-org/llama.cpp/issues/12367#issuecomment-2741056305, or unsubscribe https://github.com/notifications/unsubscribe-auth/BQFRXQ7Z2FRFYHKD56O2VEL2VLVEBAVCNFSM6AAAAABY6EUDKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBRGA2TMMZQGU . You are receiving this because you are subscribed to this thread.Message ID: @.***> [image: CavaliereDavid]CavaliereDavid left a comment (ggml-org/llama.cpp#12367) https://github.com/ggml-org/llama.cpp/issues/12367#issuecomment-2741056305
I have the same issue with Gemma3 27B
— Reply to this email directly, view it on GitHub https://github.com/ggml-org/llama.cpp/issues/12367#issuecomment-2741056305, or unsubscribe https://github.com/notifications/unsubscribe-auth/BQFRXQ7Z2FRFYHKD56O2VEL2VLVEBAVCNFSM6AAAAABY6EUDKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBRGA2TMMZQGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Users have hit this in RamaLama also:
Attempted to download Gemma3 from Ollama registry with ramalama run gemma3
Name pulled from https://www.ollama.com/library/gemma3
Got an error when running ramalama run gemma3:latest
Loading modelllama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon 2. llama_model_load_from_file_impl: failed to load model 3. initialize_model: error: unable to load model from file: /mnt/models/model.file
Cannot run model
Just tagging @ochafik and @jan-wassenberg for awareness
@ngxson Hey Son - thought you might want a tag here. Let me know if I can do anything to help!
llama.cpp never support model from ollama, they have their own implementation
Also, please note that I already warned about the incompat between llama.cpp <> ollama when llama-run add support for it.
Please do not tag me about this subject in the future.
Likely related:
https://github.com/ggml-org/llama.cpp/issues/12857
Is this Ollama-specific? The above issue doesn't seem to be an Ollama model
I do think we should try and fix this one way or another, gemma3 is a very popular model:
$ ramalama run gemma3
Loading modelllama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_model_load_from_file_impl: failed to load model
Although I do wonder could this be fixed Google-side, by ensuring on next update that Google publish portable gguf's to Ollama registry. Sorry for the tags again @ochafik @jan-wassenberg you are the only AI Google people I know!
I'm pointing people to these instead for now:
https://github.com/containers/ramalama/pull/1288/files
This issue was closed because it has been inactive for 14 days since being marked as stale.
I pulled gemma3:27b (a418f5838eaf) three days ago. runs fine from Ollama CLI, however getting this error when loading into LM Studio.
I pulled gemma3:27b (a418f5838eaf) three days ago. runs fine from Ollama CLI, however getting this error when loading into LM Studio.
Same. Did you get to fix it ?
Negative. I'm not up for spending hours or more likely days updating gemma's attention hyperparams, especially when the problem may be with LmStudio's api. I will, however try loading the model into Anything LLM when I get the chance and update this thread accordingly.
The Gemma3:27b model loads and runs in AnythingLLM and as expected in Ollama's new UI. It's possible the problem originated from the conversion to gguf locally. You may want to grab one of the gguf versions on HF.