llama.cpp Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms

Name and Version

llama-server.exe --version ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | matrix cores: none version: 4880 (2048b591) built with MSVC 19.43.34808.0 for x64

Operating systems

Windows

GGML backends

Vulkan

Hardware

Models

gemma3:12b

Problem description & steps to reproduce

it's unable to load Gemma3:12b GUFF model.

First Bad Commit

No response

Relevant log output

llama-server.exe -m %file_path_gemma3_12b% --no-mmap -c 16384 -np 1 -ngl 50 --temp 0.1 -t 9 -tb 8 -C FF000 --no-perf --host 0.0.0.0 --port 3000 

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | shared memory: 32768 | matrix cores: none
Not enough set bits in CPU mask (8) to satisfy requested thread count: 9
Not enough set bits in CPU mask (8) to satisfy requested thread count: 9
build: 4880 (2048b591) with MSVC 19.43.34808.0 for x64
system info: n_threads = 9, n_threads_batch = 8, total_threads = 20

system_info: n_threads = 9 (n_threads_batch = 8) / 20 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

main: HTTP server is listening, hostname: 0.0.0.0, port: 3000, http threads: 19
main: loading model
srv    load_model: loading model 'D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3'
llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Iris(R) Xe Graphics) - 16224 MiB free
llama_model_loader: loaded meta data with 35 key-value pairs and 1065 tensors from D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                gemma3.attention.head_count u32              = 16
llama_model_loader: - kv   1:             gemma3.attention.head_count_kv u32              = 8
llama_model_loader: - kv   2:                gemma3.attention.key_length u32              = 256
llama_model_loader: - kv   3:            gemma3.attention.sliding_window u32              = 1024
llama_model_loader: - kv   4:              gemma3.attention.value_length u32              = 256
llama_model_loader: - kv   5:                         gemma3.block_count u32              = 48
llama_model_loader: - kv   6:                      gemma3.context_length u32              = 8192
llama_model_loader: - kv   7:                    gemma3.embedding_length u32              = 3840
llama_model_loader: - kv   8:                 gemma3.feed_forward_length u32              = 15360
llama_model_loader: - kv   9:         gemma3.vision.attention.head_count u32              = 16
llama_model_loader: - kv  10: gemma3.vision.attention.layer_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  11:                  gemma3.vision.block_count u32              = 27
llama_model_loader: - kv  12:             gemma3.vision.embedding_length u32              = 1152
llama_model_loader: - kv  13:          gemma3.vision.feed_forward_length u32              = 4304
llama_model_loader: - kv  14:                   gemma3.vision.image_size u32              = 896
llama_model_loader: - kv  15:                 gemma3.vision.num_channels u32              = 3
llama_model_loader: - kv  16:                   gemma3.vision.patch_size u32              = 14
llama_model_loader: - kv  17:                       general.architecture str              = gemma3
llama_model_loader: - kv  18:                    tokenizer.chat_template str              = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  21:           tokenizer.ggml.add_padding_token bool             = false
llama_model_loader: - kv  22:           tokenizer.ggml.add_unknown_token bool             = false
llama_model_loader: - kv  23:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  24:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
llama_model_loader: - kv  26:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  27:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  28:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  29:                      tokenizer.ggml.scores arr[f32,262145]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  30:                  tokenizer.ggml.token_type arr[i32,262145]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  31:                      tokenizer.ggml.tokens arr[str,262145]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  32:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - kv  34:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  563 tensors
llama_model_loader: - type  f16:  165 tensors
llama_model_loader: - type q4_K:  290 tensors
llama_model_loader: - type q6_K:   47 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 7.57 GiB (5.34 BPW)
llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3'
srv    load_model: failed to load model, 'D:\OllamaModels\blobs\sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3'
srv   operator (): operator (): cleaning up before exit...
main: exiting due to model loading error

Mar 13 '25 12:03 simonchen

well, I think that would be an specific issue from model file from Ollama , I just tried another one from Modscope, it seems to works like charming.

Mar 13 '25 14:03 simonchen

The same issue occurs here when using Ollama gemma3:4b.

Mar 14 '25 06:03 yeahdongcn

modelscope download --local_dir $PWD lmstudio-community/gemma-3-4b-it-GGUF --include *Q4_K_M*
./llama-cli -m /data/.yeahdongcn/models/gemma-3-4b-it-Q4_K_M.gguf -ngl 999

This works for me.

Mar 14 '25 07:03 yeahdongcn

I am running similar errors with gemma3:27b + llama-server

llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'gemma3_27b.gguf'
srv    load_model: failed to load model, 'gemma3_27b.gguf'
srv    operator(): operator(): cleaning up before exit...
terminate called without an active exception
main: exiting due to model loading error
Aborted (core dumped)

Mar 15 '25 00:03 eliranwong

I have the same issue with Gemma3 27B

Mar 20 '25 16:03 CavaliereDavid

Hello

El jue, 20 de mar de 2025, 12:37 p. m., dacavali @.***> escribió:

I have the same issue with Gemma3 27B

— Reply to this email directly, view it on GitHub https://github.com/ggml-org/llama.cpp/issues/12367#issuecomment-2741056305, or unsubscribe https://github.com/notifications/unsubscribe-auth/BQFRXQ7Z2FRFYHKD56O2VEL2VLVEBAVCNFSM6AAAAABY6EUDKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBRGA2TMMZQGU . You are receiving this because you are subscribed to this thread.Message ID: @.***> [image: CavaliereDavid]CavaliereDavid left a comment (ggml-org/llama.cpp#12367) https://github.com/ggml-org/llama.cpp/issues/12367#issuecomment-2741056305

I have the same issue with Gemma3 27B

— Reply to this email directly, view it on GitHub https://github.com/ggml-org/llama.cpp/issues/12367#issuecomment-2741056305, or unsubscribe https://github.com/notifications/unsubscribe-auth/BQFRXQ7Z2FRFYHKD56O2VEL2VLVEBAVCNFSM6AAAAABY6EUDKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBRGA2TMMZQGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Mar 20 '25 19:03 zunigasllc

Users have hit this in RamaLama also:

Attempted to download Gemma3 from Ollama registry with ramalama run  gemma3
Name pulled from https://www.ollama.com/library/gemma3
Got an error when running ramalama run gemma3:latest
Loading modelllama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon 2. llama_model_load_from_file_impl: failed to load model 3. initialize_model: error: unable to load model from file: /mnt/models/model.file 
Cannot run model

Apr 22 '25 11:04 ericcurtin

Just tagging @ochafik and @jan-wassenberg for awareness

Apr 22 '25 12:04 ericcurtin

@ngxson Hey Son - thought you might want a tag here. Let me know if I can do anything to help!

Apr 22 '25 13:04 pculliton

llama.cpp never support model from ollama, they have their own implementation

Apr 22 '25 14:04 ngxson

Also, please note that I already warned about the incompat between llama.cpp <> ollama when llama-run add support for it.

Please do not tag me about this subject in the future.

Apr 22 '25 14:04 ngxson

Likely related:

https://github.com/ggml-org/llama.cpp/issues/12857

Is this Ollama-specific? The above issue doesn't seem to be an Ollama model

Apr 23 '25 21:04 ericcurtin

I do think we should try and fix this one way or another, gemma3 is a very popular model:

$ ramalama run gemma3
Loading modelllama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_model_load_from_file_impl: failed to load model

Although I do wonder could this be fixed Google-side, by ensuring on next update that Google publish portable gguf's to Ollama registry. Sorry for the tags again @ochafik @jan-wassenberg you are the only AI Google people I know!

Apr 25 '25 16:04 ericcurtin

I'm pointing people to these instead for now:

https://github.com/containers/ramalama/pull/1288/files

Apr 25 '25 17:04 ericcurtin

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jun 09 '25 01:06 github-actions[bot]

I pulled gemma3:27b (a418f5838eaf) three days ago. runs fine from Ollama CLI, however getting this error when loading into LM Studio.

gemma3 load error.txt

Jul 03 '25 14:07 pkeavney

I pulled gemma3:27b (a418f5838eaf) three days ago. runs fine from Ollama CLI, however getting this error when loading into LM Studio.

gemma3 load error.txt

Same. Did you get to fix it ?

Aug 05 '25 21:08 BenjaOliva

Negative. I'm not up for spending hours or more likely days updating gemma's attention hyperparams, especially when the problem may be with LmStudio's api. I will, however try loading the model into Anything LLM when I get the chance and update this thread accordingly.

Aug 05 '25 23:08 pkeavney

The Gemma3:27b model loads and runs in AnythingLLM and as expected in Ollama's new UI. It's possible the problem originated from the conversion to gguf locally. You may want to grab one of the gguf versions on HF.

Aug 08 '25 21:08 pkeavney

llama.cpp
llama.cpp copied to clipboard

Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

llama.cpp llama.cpp copied to clipboard

Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

llama.cpp
llama.cpp copied to clipboard