ik_llama.cpp icon indicating copy to clipboard operation
ik_llama.cpp copied to clipboard

Bug: conversion to BF16 fails for Kimi K2 Thinking

Open Lissanro opened this issue 1 month ago • 16 comments

What happened?

When trying to convert https://huggingface.co/moonshotai/Kimi-K2-Thinking to BF16 using this command:

python3 ~/pkgs/ik_llama.cpp/convert_hf_to_gguf.py --outtype bf16 \
--outfile /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16.gguf  \
/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking --split-max-size 50G

...it fails (please check the log below). The same command works when using mainline llama.cpp, so likely updates from https://github.com/ggml-org/llama.cpp/pull/17069 for the convert_hf_to_gguf.py script are not included yet. Ubergram mentioned success making quants for ik_llama.cpp using the mainline conversion script, so I assume this should work as a workaround in the meantime (currently I am still running the conversion, I want to experiment with different Ubegram recipes and quantization settings, and integrate jinja chat template withUnsloth fixes, hence why I downloaded the original release to generate my own GGUFs, but it would be great if ik_llama.cpp tools worked too to convert to BF16, if possible).

Name and Version

Latest git

What operating system are you seeing the problem on?

No response

Relevant log output

> python3 ~/pkgs/ik_llama.cpp/convert_hf_to_gguf.py --outtype bf16 --outfile /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16.gguf /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking --split-max-size 50G
INFO:hf-to-gguf:Loading model: Kimi-K2-Thinking
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-000062.safetensors'
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.bfloat16 --> BF16, shape = {18432, 7168}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,        torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_up.weight,          torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.0.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.0.attn_kv_b.weight,       torch.bfloat16 --> BF16, shape = {512, 16384}
INFO:hf-to-gguf:blk.0.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 32768}
INFO:hf-to-gguf:blk.0.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 8192}
INFO:hf-to-gguf:blk.0.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.0.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.0.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-000062.safetensors'
INFO:hf-to-gguf:blk.1.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
Traceback (most recent call last):
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 4860, in <module>
    main()
    ~~~~^^
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 4854, in main
    model_instance.write()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 430, in write
    self.prepare_tensors()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 3748, in prepare_tensors
    super().prepare_tensors()
    ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 285, in prepare_tensors
    for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)):
                                                                ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lissanro/pkgs/ik_llama.cpp/convert_hf_to_gguf.py", line 3710, in modify_tensors
    datas.append(self._experts[bid][ename])
                 ~~~~~~~~~~~~~~~~~~^^^^^^^
KeyError: 'model.layers.1.mlp.experts.0.down_proj.weight'

Lissanro avatar Nov 12 '25 04:11 Lissanro

Thanks for the report.

Can you post the output of llama.cpp's version of convert_hf_to_gguf.py?

ikawrakow avatar Nov 12 '25 05:11 ikawrakow

Sure, here it is (I ran out of space on my 8 TB NVMe, so ended up converting on slow HDD which took a while, so it just finished). I attach the log below.

Another thing I noticed, it asked if I wish to use custom code and it default to "no", not sure if I need it, is it relevant for GGUF? I will tag @ubergarm because I think he already got experience with the conversion, I would appreciate confirmation if I am doing BF16 correctly for the Kimi K2 Thinking model. The part of the log relevant to this question:

...contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking .
 You can inspect the repository content at https://hf.co//mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead

Full conversion log including the command I used:

> python3 ~/pkgs/llama.cpp/convert_hf_to_gguf.py --outtype bf16 \
--outfile /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16.gguf \
/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking --split-max-size 50G
INFO:hf-to-gguf:Loading model: Kimi-K2-Thinking
WARNING:hf-to-gguf:Failed to load model config from /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking: The repository /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking .
 You can inspect the repository content at https://hf.co//mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: DeepseekV3ForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking: The repository /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking .
 You can inspect the repository content at https://hf.co//mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00003-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00004-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00005-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00006-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00007-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00008-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00009-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00010-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00011-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00012-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00013-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00014-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00015-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00016-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00017-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00018-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00019-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00020-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00021-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00022-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00023-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00024-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00025-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00026-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00027-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00028-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00029-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00030-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00031-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00032-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00033-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00034-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00035-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00036-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00037-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00038-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00039-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00040-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00041-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00042-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00043-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00044-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00045-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00046-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00047-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00048-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00049-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00050-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00051-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00052-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00053-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00054-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00055-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00056-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00057-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00058-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00059-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00060-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00061-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00062-of-000062.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.bfloat16 --> BF16, shape = {18432, 7168}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,        torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_up.weight,          torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.0.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.0.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.0.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.0.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.0.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.0.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.1.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.1.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.1.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.1.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.1.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.1.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.1.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.1.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.1.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.1.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.1.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.1.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.2.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.2.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.2.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.2.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.2.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.2.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.2.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.2.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.2.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.2.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.2.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.2.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.2.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.2.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.3.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.3.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.3.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.3.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.3.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.3.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.3.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.3.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.3.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.3.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.3.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.3.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.3.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.3.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.4.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.4.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.4.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.4.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.4.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.4.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.4.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.4.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.4.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.4.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.4.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.4.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.4.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.4.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.5.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.5.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.5.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.5.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.5.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.5.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.5.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.5.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.5.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.5.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.5.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.5.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.5.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.6.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.6.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.6.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.6.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.6.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.6.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.6.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.6.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.6.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.6.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.6.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.6.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.6.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.6.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.7.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.7.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.7.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.7.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.7.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.7.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.7.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.7.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.7.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.7.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.7.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.7.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.7.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.7.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.8.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.8.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.8.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.8.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.8.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.8.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.8.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.8.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.8.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.9.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.9.exp_probs_b.bias,       torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.9.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.9.ffn_down_shexp.weight,  torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.9.ffn_gate_shexp.weight,  torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.9.ffn_up_shexp.weight,    torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.9.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.9.attn_kv_a_mqa.weight,   torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.9.attn_k_b.weight,        torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.9.attn_v_b.weight,        torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.9.attn_output.weight,     torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.9.attn_q_a_norm.weight,   torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.9.attn_q_a.weight,        torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.9.attn_q_b.weight,        torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.10.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.10.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.10.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.10.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.10.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.10.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.10.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.10.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.10.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.10.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.10.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.10.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.10.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.10.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.11.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.11.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.11.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.11.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.11.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.11.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.11.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.11.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.11.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.11.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.11.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.11.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.11.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.12.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.12.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.12.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.12.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.12.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.12.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.12.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.12.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.12.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.12.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.12.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.12.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.12.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.12.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.13.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.13.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.13.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.13.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.13.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.13.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.13.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.13.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.13.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.13.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.13.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.13.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.13.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.13.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.14.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.14.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.14.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.14.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.14.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.14.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.14.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.14.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.14.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.14.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.14.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.14.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.14.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.14.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.15.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.15.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.15.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.15.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.15.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.15.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.15.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.15.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.15.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.15.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.15.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.15.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.15.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.15.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.16.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.16.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.16.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.16.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.16.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.16.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.16.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.16.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.16.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.16.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.16.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.16.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.16.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.16.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.17.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.17.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.17.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.17.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.17.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.17.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.17.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.17.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.17.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.17.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.17.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.17.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.18.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.18.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.18.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.18.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.18.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.18.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.18.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.18.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.18.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.18.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.18.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.18.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.18.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.18.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.19.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.19.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.19.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.19.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.19.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.19.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.19.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.19.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.19.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.19.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.19.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.19.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.19.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.19.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.20.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.20.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.20.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.20.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.20.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.20.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.20.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.20.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.20.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.20.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.20.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.20.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.20.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.20.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.21.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.21.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.21.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.21.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.21.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.21.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.21.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.21.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.21.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.21.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.21.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.21.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.21.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.21.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.22.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.22.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.22.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.22.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.22.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.22.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.22.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.22.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.22.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.22.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.22.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.22.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.22.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.22.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.23.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.23.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.23.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.23.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.23.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.23.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.23.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.23.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.23.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.23.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.23.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.23.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.23.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.24.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.24.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.24.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.24.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.24.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.24.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.24.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.24.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.24.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.24.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.24.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.24.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.24.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.24.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.25.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.25.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.25.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.25.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.25.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.25.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.25.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.25.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.25.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.25.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.25.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.25.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.25.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.25.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.26.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.26.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.26.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.26.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.26.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.26.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.26.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.26.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.26.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.26.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.26.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.26.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.26.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.26.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.27.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.27.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.27.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.27.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.27.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.27.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.27.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.27.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.27.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.27.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.27.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.27.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.27.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.27.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.28.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.28.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.28.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.28.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.28.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.28.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.28.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.28.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.28.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.28.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.28.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.28.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.28.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.28.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.29.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.29.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.29.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.29.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.29.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.29.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.29.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.29.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.29.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.29.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.29.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.29.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.29.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.30.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.30.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.30.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.30.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.30.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.30.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.30.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.30.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.30.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.30.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.30.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.30.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.30.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.30.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.31.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.31.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.31.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.31.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.31.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.31.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.31.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.31.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.31.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.31.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.31.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.31.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.31.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.31.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.32.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.32.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.32.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.32.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.32.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.32.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.32.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.32.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.32.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.32.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.32.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.32.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.32.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.32.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.32.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.33.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.33.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.33.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.33.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.33.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.33.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.33.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.33.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.33.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.33.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.33.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.33.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.33.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.33.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.33.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.34.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.34.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.34.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.34.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.34.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.34.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.34.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.34.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.34.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.34.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.34.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.34.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.34.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.34.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.34.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.35.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.35.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.35.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.35.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.35.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.35.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.35.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.35.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.35.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.35.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.35.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.35.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.35.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.35.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.36.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.36.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.36.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.36.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.36.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.36.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.36.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.36.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.36.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.36.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.36.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.36.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.36.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.36.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.36.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.37.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.37.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.37.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.37.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.37.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.37.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.37.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.37.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.37.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.37.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.37.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.37.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.37.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.37.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.37.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.38.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.38.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.38.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.38.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.38.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.38.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.38.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.38.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.38.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.38.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.38.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.38.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.38.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.38.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.38.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.39.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.39.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.39.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.39.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.39.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.39.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.39.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.39.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.39.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.39.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.39.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.39.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.39.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.39.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.39.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.40.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.40.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.40.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.40.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.40.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.40.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.40.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.40.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.40.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.40.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.40.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.40.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.40.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.40.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.40.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.41.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.41.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.41.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.41.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.41.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.41.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.41.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.41.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.41.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.41.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.41.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.41.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.41.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.41.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.42.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.42.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.42.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.42.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.42.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.42.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.42.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.42.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.42.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.42.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.42.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.42.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.42.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.42.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.42.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.43.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.43.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.43.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.43.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.43.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.43.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.43.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.43.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.43.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.43.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.43.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.43.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.43.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.43.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.43.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.44.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.44.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.44.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.44.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.44.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.44.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.44.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.44.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.44.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.44.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.44.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.44.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.44.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.44.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.44.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.45.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.45.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.45.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.45.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.45.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.45.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.45.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.45.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.45.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.45.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.45.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.45.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.45.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.45.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.45.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.46.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.46.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.46.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.46.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.46.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.46.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.46.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.46.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.46.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.46.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.46.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.46.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.46.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.46.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.46.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.47.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.47.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.47.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.47.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.47.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.47.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.47.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.47.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.47.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.47.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.47.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.47.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.47.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.47.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.47.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.48.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.48.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.48.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.48.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.48.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.48.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.48.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.48.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.48.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.48.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.48.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.48.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.48.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.48.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.48.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.49.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.49.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.49.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.49.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.49.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.49.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.49.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.49.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.49.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.49.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.49.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.49.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.49.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.49.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.49.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.50.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.50.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.50.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.50.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.50.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.50.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.50.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.50.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.50.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.50.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.50.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.50.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.50.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.50.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.50.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.51.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.51.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.51.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.51.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.51.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.51.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.51.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.51.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.51.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.51.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.51.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.51.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.51.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.51.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.51.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.52.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.52.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.52.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.52.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.52.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.52.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.52.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.52.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.52.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.52.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.52.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.52.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.52.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.52.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.52.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.53.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.53.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.53.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.53.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.53.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.53.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.53.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.53.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.53.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.53.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.53.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.53.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.53.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.53.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.53.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.54.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.54.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.54.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.54.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.54.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.54.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.54.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.54.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.54.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.54.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.54.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.54.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.54.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.54.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.54.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.55.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.55.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.55.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.55.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.55.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.55.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.55.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.55.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.55.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.55.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.55.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.55.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.55.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.55.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.55.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.56.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.56.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.56.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.56.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.56.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.56.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.56.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.56.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.56.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.56.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.56.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.56.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.56.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.56.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.56.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.57.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.57.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.57.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.57.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.57.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.57.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.57.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.57.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.57.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.57.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.57.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.57.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.57.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.57.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.57.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.58.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.58.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.58.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.58.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.58.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.58.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.58.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.58.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.58.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.58.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.58.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.58.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.58.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.58.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.58.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.59.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.59.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.59.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.59.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.59.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.59.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.59.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.59.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.59.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.59.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.59.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.59.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.59.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.59.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.59.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.60.attn_norm.weight,      torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.60.exp_probs_b.bias,      torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.60.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.60.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.60.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.60.ffn_up_shexp.weight,   torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.60.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.60.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.60.attn_kv_a_mqa.weight,  torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.60.attn_k_b.weight,       torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.60.attn_v_b.weight,       torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.60.attn_output.weight,    torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.60.attn_q_a_norm.weight,  torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.60.attn_q_a.weight,       torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.60.attn_q_b.weight,       torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:output.weight,                torch.bfloat16 --> BF16, shape = {7168, 163840}
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> BF16, shape = {7168, 163840}
INFO:hf-to-gguf:output_norm.weight,           torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.1.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.1.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.2.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.4.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.4.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.5.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.5.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.5.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.6.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.6.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.6.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.7.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.7.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.7.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.9.ffn_down_exps.weight,   torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.9.ffn_gate_exps.weight,   torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.9.ffn_up_exps.weight,     torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.10.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.10.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.10.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.11.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.11.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.11.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.12.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.12.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.12.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.13.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.13.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.13.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.14.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.14.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.14.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.15.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.15.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.15.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.16.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.16.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.16.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.17.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.17.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.17.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.18.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.18.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.18.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.19.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.19.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.19.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.20.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.20.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.20.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.21.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.21.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.21.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.22.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.22.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.22.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.23.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.23.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.23.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.24.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.24.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.24.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.25.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.25.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.25.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.26.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.26.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.26.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.27.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.27.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.27.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.28.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.28.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.28.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.29.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.29.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.29.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.30.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.30.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.30.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.31.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.31.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.31.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.32.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.32.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.32.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.33.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.33.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.33.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.34.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.34.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.34.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.35.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.35.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.35.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.36.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.36.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.36.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.37.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.37.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.37.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.38.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.38.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.38.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.39.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.39.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.39.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.40.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.40.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.40.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.41.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.41.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.41.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.42.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.42.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.42.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.43.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.43.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.43.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.44.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.44.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.44.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.45.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.45.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.45.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.46.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.46.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.46.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.47.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.47.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.47.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.48.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.48.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.48.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.49.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.49.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.49.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.50.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.50.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.50.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.51.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.51.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.51.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.52.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.52.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.52.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.53.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.53.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.53.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.54.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.54.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.54.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.55.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.55.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.55.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.56.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.56.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.56.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.57.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.57.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.57.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.58.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.58.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.58.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.59.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.59.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.59.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.60.ffn_down_exps.weight,  torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.60.ffn_gate_exps.weight,  torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.60.ffn_up_exps.weight,    torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 262144
INFO:hf-to-gguf:gguf: embedding length = 7168
INFO:hf-to-gguf:gguf: feed forward length = 18432
INFO:hf-to-gguf:gguf: head count = 64
INFO:hf-to-gguf:gguf: key-value head count = 1
INFO:hf-to-gguf:gguf: rope theta = 50000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 8
INFO:hf-to-gguf:gguf: expert groups count = 1
INFO:hf-to-gguf:gguf: expert groups used count = 1
INFO:hf-to-gguf:gguf: file type = 32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
The repository /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking .
 You can inspect the repository content at https://hf.co//mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] INFO:transformers_modules.Kimi-K2-Thinking.tokenization_kimi:Reloaded tiktoken model from /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking/tiktoken.model
INFO:transformers_modules.Kimi-K2-Thinking.tokenization_kimi:#words: 163840 - BOS ID: 163584 - EOS ID: 163585
INFO:gguf.vocab:Setting special token type bos to 163584
INFO:gguf.vocab:Setting special token type eos to 163586
INFO:gguf.vocab:Setting special token type pad to 163839
INFO:gguf.vocab:Setting chat_template to {# Unsloth template fixes #}
{%- macro render_content(msg) -%}
    {%- set c = msg.get('content') -%}
    {%- if c is string -%}
      {{ c }}
    {%- elif c is not none -%}
      {% for content in c -%}
        {% if content['type'] == 'image' or 'image' in content or 'image_url' in content -%}
          <|media_start|>image<|media_content|><|media_pad|><|media_end|>
        {% else -%}
          {{ content['text'] }}
        {%- endif -%}
      {%- endfor -%}
    {%- endif -%}
{%- endmacro -%}

{% macro set_roles(message) -%}
  {%- set role_name =  message.get('name') or  message['role'] -%}
  {%- if message['role'] == 'user' -%}
    <|im_user|>{{role_name}}<|im_middle|>
  {%- elif message['role'] == 'assistant' -%}
    <|im_assistant|>{{role_name}}<|im_middle|>
  {%- else -%}
    <|im_system|>{{role_name}}<|im_middle|>
  {%- endif -%}
{%- endmacro -%}


{%- macro render_toolcalls(message) -%}
  <|tool_calls_section_begin|>
  {%- for tool_call in message['tool_calls'] -%}
    {%- set formatted_id = tool_call['id'] -%}
    <|tool_call_begin|>{{ formatted_id }}<|tool_call_argument_begin|>{% if tool_call['function']['arguments'] is string %}{{ tool_call['function']['arguments'] }}{% else %}{{ tool_call['function']['arguments'] | tojson }}{% endif %}<|tool_call_end|>
  {%- endfor -%}
  <|tool_calls_section_end|>
{%- endmacro -%}


{# Find last non-tool-call assisitant message #}
{%- set ns = namespace(last_non_tool_call_assistant_msg=-1) -%}
{%- for idx in range(messages|length-1, -1, -1) -%}
    {%- if messages[idx]['role'] == 'assistant' and not messages[idx].get('tool_calls') -%}
        {%- set ns.last_non_tool_call_assistant_msg = idx -%}
        {%- break -%}
    {%- endif -%}
{%- endfor -%}

{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}
{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}

{%- if tools -%}{%- set tools_json = tools | tojson -%}{%- set tools_json = tools_json.replace(", ", ",") -%}{%- set tools_json = tools_json.replace(": ", ":") -%}
  <|im_system|>tool_declare<|im_middle|>{{ tools_json }}<|im_end|>
{%- endif -%}

{%- if messages and messages|length > 0 -%}
  {%- if messages[0]['role'] != 'system' -%}
  <|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|>
  {%- endif -%}
{%- endif -%}

{%- for message in hist_msgs -%}
  {{set_roles(message)}}
  {%- if message['role'] == 'assistant' -%}
    <think></think>{{render_content(message)}}
    {%- if message.get('tool_calls') -%}
      {{render_toolcalls(message)}}
    {%- endif -%}
  {%- elif message['role'] == 'tool' -%}
    {%- set tool_call_id = message.tool_call_id -%}
    ## Return of {{ tool_call_id }}
{{render_content(message)}}
  {%- elif message['content'] is not none -%}
    {{render_content(message)}}
  {%- endif -%}
  <|im_end|>
{%- endfor -%}

{%- for message in suffix_msgs -%}
  {{set_roles(message)}}
  {%- if message['role'] == 'assistant' -%}
    {%- set rc = message.get('reasoning_content', '') -%}
    <think>{{rc}}</think>{{render_content(message)}}
    {%- if message.get('tool_calls') -%}
     {{render_toolcalls(message)}}
    {%- endif -%}
  {%- elif message['role'] == 'tool' -%}
    {%- set tool_call_id = message.tool_call_id -%}
    ## Return of {{ tool_call_id }}
{{render_content(message)}}
  {%- elif message['content'] is not none -%}
    {{render_content(message)}}
  {%- endif -%}
  <|im_end|>
{%- endfor -%}


{%- if add_generation_prompt -%}
  <|im_assistant|>assistant<|im_middle|>
{%- endif -%}
{# Copyright 2025-present Unsloth. Apache 2.0 License. #}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00001-of-00046.gguf: n_tensors = 918, total_size = 46.3G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00002-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00003-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00004-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00005-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00006-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00007-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00008-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00009-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00010-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00011-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00012-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00013-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00014-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00015-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00016-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00017-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00018-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00019-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00020-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00021-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00022-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00023-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00024-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00025-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00026-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00027-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00028-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00029-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00030-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00031-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00032-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00033-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00034-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00035-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00036-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00037-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00038-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00039-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00040-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00041-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00042-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00043-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00044-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00045-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/Kimi-K2-Thinking-BF16-00046-of-00046.gguf: n_tensors = 2, total_size = 22.5G
[...progress bar was here...] 2.05T/2.05T [13:57:31<00:00, 40.9Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /mnt/Toshiba_Canvio_4TB_Top_Left/neuro/Kimi-K2-Thinking-BF16/

Lissanro avatar Nov 12 '25 21:11 Lissanro

@Lissanro

So the new Kimi-K2-Thinking model was released not in standard full bf16 or fp8 safetensors.

They are using a project called compressed-tensors (link is an issue I opened over there. The config.json will have an arbitrary quantization configuration. In this case they used what appears to be 4bit weights in blocks of 32 with a bf16 scale.

However, q4_0 is 4 bit weights in blocks of 32 with an f16 scale so it isn't necessarily perfect match.

There were two PRs on mainline, and this one is the one I used:~~https://github.com/ggml-org/llama.cpp/pull/17064~~ EDIT sorry i used 17069. I'll elaborate below.

The approach there is to not use compressed-tensors first to decompress and get a full bf16 (as that supposedly takes 2.5TB of RAM according to unsloth Dan) https://huggingface.co/moonshotai/Kimi-K2-Thinking/discussions/2

But it just unpacks the quantized safetensors into unquantized bf16 GGUF.

This is what I then used with ik_llama.cpp to make all my quants.

HOWEVER, there is some ongoing stuff now given the supposed q8_0/q4_0 recipe was showing worse perplexity than a full q8_0 in my own testing suggesting something is off with either the QAT or the mapping into q4_0 of those routed experts.

You can follow along in the PR and see I've tested a new patched llama-quantize that is achieving the "full quality" perplexity with the q4_0 routed experts. jukofyork has been messing with it and opened a discussion with moonshot directly to try to figure out if there was a mistake or if they intended this: https://huggingface.co/moonshotai/Kimi-K2-Thinking/discussions/26

The patch I'm currently testing is just one line in quantize_row_q4_0_ref()

$ git diff
diff --git a/ggml/src/ggml-quants.c b/ggml/src/ggml-quants.c
index 20a9831b..05feef4f 100644
--- a/ggml/src/ggml-quants.c
+++ b/ggml/src/ggml-quants.c
@@ -689,7 +689,7 @@ void quantize_row_q4_0_ref(const float * restrict x, block_q4_0 * restrict y, in
             }
         }

-        const float d  = max / -8;
+        const float d  = max / -7;
         const float id = d ? 1.0f/d : 0.0f;

         y[i].d = GGML_FP32_TO_FP16(d);

So anyway, I'm hoping future safetensor models won't be released with willy nilly QAT target quantization types with compressed-tensors that may or may not map nicely into available ik/llama.cpp quantization types.

Hope that helps explain why the not yet patched ik convert script works on older Kimi-K2 but not this new quantized one they released.

ubergarm avatar Nov 12 '25 22:11 ubergarm

@ubergarm

So, picking up 17064 from mainline is all it takes to support Kimi2-Thinking?

If the MoE tensors are effectively Q4_0, I'm wondering why they didn't add the ability to export directly to that? That would remove the need to be modifying the Q4_0 quantization function.

A few observations:

  • Using more that 4 bits for the routed experts makes no sense
  • If using 4 bits, it makes no sense to use anything but Q4_0.
  • If one insisted on making Q8_0 routed experts, then one would need to modify the Q8_0 quantization function as well to use max / -112 (or any $7 i$ with $i \in [1, 16]$ )
  • In the linked discussion jukofyork states that almost all blocks of 32 weights use the full [-7, 7] range. If that were true, it would mean that accuracy of most sub-4 bit quantization types will be less than ideal. All except the Trellis quants are explicitly asymmetric, so will not do well on data that has been made fully symmetric via the QAT training.

ikawrakow avatar Nov 13 '25 08:11 ikawrakow

@ikawrakow

So, picking up 17064 from mainline is all it takes to support Kimi2-Thinking?

Oof, sorry, I made a mistake. The mainline PR I used to conver the moonshotai compressed-tensors safetensors to BF16 GGUF was 17069. So the one to pick up would be 17069 which seems to manually decompress some compressed-tensors formatted safetensors without a dependency on that library.

I'm wondering why they didn't add the ability to export directly to that? That would remove the need to be modifying the Q4_0 quantization function.

Correct, that was the first attempt which was their PR 17064 to just "repack" directly to Q4_0. However the implementation had some issues and was ballooning memory and was not working. So it was closed in favor of 17069 for now.

While not ideal to need space to store a full BF16 GGUF and end up "round-tripping" through that back to Q4_0, it got things moving again to at least release some early quants.

Using more that 4 bits for the routed experts makes no sense

Agreed, I only made a full Q8_0 just for perplexity testing to see how close it was to my initial Q4_0. And given the full Q8_0 was "better" than the Q4_0 it suggested there was some issue in mapping as there was potential quality left on the table. Of course my Q8_0 here was not modified to use max / -112 so...

In the linked discussion jukofyork states that almost all blocks of 32 weights use the full [-7, 7] range. If that were true, it would mean that accuracy of most sub-4 bit quantization types will be less than ideal.

Yeah official moonshotai responded that yes they indeed targeted symmetric quantization:

We used symmetric quantization and didn't use -8 to avoid extra quant bias. https://huggingface.co/moonshotai/Kimi-K2-Thinking/discussions/26#691605310274a99b0dfadc5c

All except the Trellis quants are explicitly asymmetric, so will not do well on data that has been made fully symmetric via the QAT training.

Ahh good to know. I'm guessing it isn't so easy to adjust the other quantization types with a single line to improve quality knowing the input is symmetric e.g. for quantize_row_iq3_k_impl() in iqk_quantize.cpp changing float d = max_abs_scale/31; to like /15 instead etc... EDIT I tested this and it was slightly worse (0.0013).

Here is where I stand so far with the various perplexity measurements using all unmodified quantization code except for q4_0 modified only for the Q4_X baseline quant (not shown) and an experimental q4_0 ffn_down_exps and iq3_k ffn_(gate|up)_exps using imatrix data only for the Q4_X-IQ3_K (not released as I'm just experimenting with these hacked mixes today)

Image

ubergarm avatar Nov 13 '25 18:11 ubergarm

I successfully converted using the new @ubergarm recipe as described at https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF in the Q4_X section. It works, but encountered another issue (mostly broken tool calling), at first I thought could be quantization issue, but upon further research I think the quant may be fine since it works otherwise when no tool calling involved, so I reported it as a separate issue, sharing what I was able to find there.

@ubergarm Can you please share your command to measure perplexity? It would help to see it for reference. I know I can use llama-perplexity and just reuse most of the arguments like in the main llama-server command but did you use the default context size and /wikitext-2-raw/wiki.test.raw for measurement? Just want to make sure I measure exactly the same to check if I correctly made my quant.

Lissanro avatar Nov 14 '25 01:11 Lissanro

@Lissanro

Great glad you were able to convert the compressed-tensors quantized safetensors to bf16 gguf and then to the mix of q4_0 routed experts and q8_0 everything else. I assume you manually patched ik_llama.cpp quantize_row_q4_0_ref() as shown above and recompiled before creating your "Q4_X" ?

Can you please share your command to measure perplexity?

Sure here it is again. Right I always use default 512 context and unquantized f16 kv-cache for my published numbers in the charts. And yes the usual wiki.test.raw file.

$ wget https://huggingface.co/datasets/ikawrakow/validation-datasets-for-llama.cpp/resolve/main/wiki.test.raw.gz
$ gunzip wiki.test.raw.gz
$ ls -lah wiki.test.raw
-rw-rw-r-- 1 w w 1.3M Mar  5  2025 wiki.test.raw
$ sha1sum wiki.test.raw
6f1fe2054a940eebfc76b284b09680763b37f5ea  wiki.test.raw

$ numactl -N ${SOCKET} -m ${SOCKET} \
./build/bin/llama-perplexity \
    -m "$model" \
    -f wiki.test.raw \
    --seed 1337 \
    -mla 3 \
    --ctx-size 512 \
    -ub 4096 -b 4096 \
    --numa numactl \
    --threads 96 \
    --threads-batch 128 \
    --no-mmap

The seed does nothing here, it is just for fun. I don't think you need -mla 3 anymore as that is default now. I specify context just to be explicit, but 512 is the default value. You can adjust batch size as needed for your rig (generally i avoid going over 4096) and it doesn't effect results. Of course adjust threads, offload, and others as desired.

I tested a recipe using iq3_kt for ffn_(gate|up)_exps and patched q4_0 for ffn_down_exps and q8_0 remaining repeating layers w/ iq6_k for both token embedding and output. I only used imatrix for the iq3_kt tensors and got: Final estimate: PPL over 568 chunks for n_ctx=512 = 2.1867 +/- 0.00969

Updated the above graph with the new datapoint.

ubergarm avatar Nov 14 '25 03:11 ubergarm

I'm guessing it isn't so easy to adjust the other quantization types with a single line to improve quality knowing the input is symmetric e.g. for quantize_row_iq3_k_impl() in iqk_quantize.cpp changing float d = max_abs_scale/31; to like /15 instead etc... EDIT I tested this and it was slightly worse (0.0013).

They all use a lookup table, so no, not easy to change. Here is for instance the IQ3_K/IQ3_KS mapping

GGML_TABLE_BEGIN(int8_t, iq3nl_values, 16)
    -63, -40, -23, -10, 1, 13, 28,  47,  
    -59, -36, -19,  -6, 5, 17, 32,  51,  
GGML_TABLE_END()

Changing the lookup table is of course possible, but that would break existing quantized models. Ideally one should be able to encode the IQK quants lookup tables in the GGUF. But that is a very significant change, so not something one could just do in a couple of hours.

ikawrakow avatar Nov 14 '25 05:11 ikawrakow

It works, but encountered another issue (mostly broken tool calling),

@Lissanro There is PR #958 that is supposed to fix that. Can you try it? Thanks.

ikawrakow avatar Nov 14 '25 05:11 ikawrakow

@ikawrakow I have tried it, but still failing tool calling inside the think block. It just prints XML and then usually fails to generate any actual response, most likely because the model expected tool calling to return something but it did not. I provided more detailed response in #955

Lissanro avatar Nov 15 '25 21:11 Lissanro

Kimi-K2 should not emit tool calls inside the thinking block according to the official specification. However, multiple users have reported that the model sometimes generates tool calls during reasoning, and in a format that is completely different from the expected one. This suggests that the model is not fully aligned.

Given this, I believe the safer approach is either to simply ignore malformed tool calls that appear inside the reasoning block, or try to fully support these unexpected formats. Alternatively, we could design an entirely new parser that handles both tool-call patterns. But, once we support two formats, the model may start producing a third unsupported variant.

The current content/reasoning parser I wrote cannot robustly support multiple tool-call formats at once, though the tool-call parser can handle them if provided the right parameters.

hksdpc255 avatar Nov 16 '25 02:11 hksdpc255

Kimi K2 and Kimi K2 Thinking are completely different models. According to https://huggingface.co/moonshotai/Kimi-K2-Thinking it can emit multiple tool calls during thinking:

Deep Thinking & Tool Orchestration: End-to-end trained to interleave chain-of-thought reasoning with function calls

Also, important to keep in mind, that whatever Roo Code outputs, may not represent actual output of the model exactly - it converts some markup, tags, etc. There is no reason to think they are malformed unless we inspect exact tokens the model generated, but I am not sure how to get them.

Also, when non-thinking it may emit reasoning token during tool calling and in #955 I linked to VLLM report that discusses how Kimi K2 Thinking works: "Prevent special token leakage in KimiK2ToolParser streaming mode" https://github.com/vllm-project/vllm/pull/28543 (maybe this explains why some tool calls outside of thinking block are failing).

Lissanro avatar Nov 16 '25 03:11 Lissanro

According to the chat template, these two models differ only in if they emit reasoning content. They otherwise share exactly the same tool-call format.

Also, See https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/docs/tool_call_guidance.md

Manually Parsing Tool Calls

The tool call requests generated by Kimi-K2 can also be parsed manually, which is especially useful when the service you are using does not provide a tool-call parser. The tool call requests generated by Kimi-K2 are wrapped by <|tool_calls_section_begin|> and <|tool_calls_section_end|>, with each tool call wrapped by <|tool_call_begin|> and <|tool_call_end|>. The tool ID and arguments are separated by <|tool_call_argument_begin|>. The format of the tool ID is functions.{func_name}:{idx}, from which we can parse the function name.

And, https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905/blob/main/docs/tool_call_guidance.md

Seems just the same.

hksdpc255 avatar Nov 16 '25 03:11 hksdpc255

Interesting, but they are very different in practice though, like Kimi K2 0905 was always very reliable for me in Roo Code, I use it daily since its release. As soon as I tried K2 Thinking, almost none of tool calls were working. With #955 at least vast majority of tool calls outside the think block works.

Lissanro avatar Nov 16 '25 04:11 Lissanro

I’ve found a potential issue, although I’m not yet sure whether it’s directly related to the problem you’re seeing. It looks like the Kimi-K2-Thinking template may still need additional fixes in this part:

  {%- elif message['role'] == 'tool' -%}
    {%- set tool_call_id = message.tool_call_id -%}
    ## Return of {{ tool_call_id }}
{{render_content(message)}}
  {%- elif message['content'] is not none -%}

message.tool_call_id may not be rendered as expected. I needs help to fix that.

Edit: fixed by #996

hksdpc255 avatar Nov 16 '25 04:11 hksdpc255

I looked at PR 17069 in mainline. It is not copy/paste-able here. Hence, I'll leave at that (use mainline to convert Kimi-Thinking)

ikawrakow avatar Nov 17 '25 17:11 ikawrakow