MiniCPM [Bad Case]: 在LM Studio 用不了

Description / 描述

當於LM Studio載入"minicpm-2b-dpo-fp32.Q6_K.gguf" 時，報錯："create_tensor: tensor 'output.weight' not found"，不知道應該設定甚麼Preset

Case Explaination / 案例解释

No response

Feb 07 '24 11:02 dr-data

你好，请问能否提供更详细的复现说明？ Hi, could you provide how to reproduced this problem with more details?

Feb 07 '24 13:02 huangyuxiang03

I got same error on LM Studio - "llama.cpp error: 'create_tensor: tensor 'output.weight' not found'"

Impacts all models.

Feb 07 '24 20:02 sungkim11

你好，请问能否提供更详细的复现说明？ Hi, could you provide how to reproduced this problem with more details?

Download the LM Studio: https://lmstudio.ai/
Search and Download the MiniCPM in LM Studio through the search bar in LM Studio.
Select the MiniCPM model to Load in LM Studio
Error came out.

Feb 08 '24 04:02 dr-data

Just found this PR merged into llama.cpp master. However, using llama.cpp b2100. Got the same error. Platform: Windows 11 Log file:

[1707396576] Log start
[1707396576] Cmd: D:\llama.cpp\main.exe -ngl 35 -m MiniCPM-2B-dpo.Q4_K_M.gguf --color -c 1024 --temp 0.3 --repeat_penalty 1.02 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{write a poem about love and death}\n\n### Response:"
[1707396576] main: build = 2098 (26d4efd1)
[1707396576] main: built with MSVC 19.37.32826.1 for x64
[1707396576] main: seed  = 1707396576
[1707396576] main: llama backend init
[1707396576] main: load the model and apply lora adapter, if any
[1707396576] llama_model_loader: loaded meta data with 22 key-value pairs and 362 tensors from MiniCPM-2B-dpo.Q4_K_M.gguf (version GGUF V3 (latest))
[1707396576] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1707396576] llama_model_loader: - kv   0:                       general.architecture str              = llama
[1707396576] llama_model_loader: - kv   1:                               general.name str              = .
[1707396576] llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
[1707396576] llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2304
[1707396576] llama_model_loader: - kv   4:                          llama.block_count u32              = 40
[1707396576] llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5760
[1707396576] llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
[1707396576] llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 36
[1707396576] llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 36
[1707396576] llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
[1707396576] llama_model_loader: - kv  10:                          general.file_type u32              = 15
[1707396576] llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
[1707396576] llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,122753]  = ["<unk>", "<s>", "</s>", "<SEP>", "<C...
[1707396576] llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,122753]  = [0.000000, 0.000000, 0.000000, 0.0000...
[1707396576] llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,122753]  = [2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[1707396576] llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
[1707396576] llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
[1707396576] llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
[1707396576] llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
[1707396576] llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
[1707396576] llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
[1707396576] llama_model_loader: - kv  21:               general.quantization_version u32              = 2
[1707396576] llama_model_loader: - type  f32:   81 tensors
[1707396576] llama_model_loader: - type q5_0:   20 tensors
[1707396576] llama_model_loader: - type q8_0:   20 tensors
[1707396576] llama_model_loader: - type q4_K:  221 tensors
[1707396576] llama_model_loader: - type q6_K:   20 tensors
[1707396576] llm_load_vocab: mismatch in special tokens definition ( 3528/122753 vs 259/122753 ).
[1707396576] llm_load_print_meta: format           = GGUF V3 (latest)
[1707396576] llm_load_print_meta: arch             = llama
[1707396576] llm_load_print_meta: vocab type       = SPM
[1707396576] llm_load_print_meta: n_vocab          = 122753
[1707396576] llm_load_print_meta: n_merges         = 0
[1707396576] llm_load_print_meta: n_ctx_train      = 2048
[1707396576] llm_load_print_meta: n_embd           = 2304
[1707396576] llm_load_print_meta: n_head           = 36
[1707396576] llm_load_print_meta: n_head_kv        = 36
[1707396576] llm_load_print_meta: n_layer          = 40
[1707396576] llm_load_print_meta: n_rot            = 64
[1707396576] llm_load_print_meta: n_embd_head_k    = 64
[1707396576] llm_load_print_meta: n_embd_head_v    = 64
[1707396576] llm_load_print_meta: n_gqa            = 1
[1707396576] llm_load_print_meta: n_embd_k_gqa     = 2304
[1707396576] llm_load_print_meta: n_embd_v_gqa     = 2304
[1707396576] llm_load_print_meta: f_norm_eps       = 0.0e+00
[1707396576] llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
[1707396576] llm_load_print_meta: f_clamp_kqv      = 0.0e+00
[1707396576] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1707396576] llm_load_print_meta: n_ff             = 5760
[1707396576] llm_load_print_meta: n_expert         = 0
[1707396576] llm_load_print_meta: n_expert_used    = 0
[1707396576] llm_load_print_meta: rope scaling     = linear
[1707396576] llm_load_print_meta: freq_base_train  = 10000.0
[1707396576] llm_load_print_meta: freq_scale_train = 1
[1707396576] llm_load_print_meta: n_yarn_orig_ctx  = 2048
[1707396576] llm_load_print_meta: rope_finetuned   = unknown
[1707396576] llm_load_print_meta: model type       = 13B
[1707396576] llm_load_print_meta: model ftype      = Q4_K - Medium
[1707396576] llm_load_print_meta: model params     = 2.72 B
[1707396576] llm_load_print_meta: model size       = 1.61 GiB (5.07 BPW) 
[1707396576] llm_load_print_meta: general.name     = .
[1707396576] llm_load_print_meta: BOS token        = 1 '<s>'
[1707396576] llm_load_print_meta: EOS token        = 2 '</s>'
[1707396576] llm_load_print_meta: UNK token        = 0 '<unk>'
[1707396576] llm_load_print_meta: LF token         = 1099 '<0x0A>'
[1707396576] llm_load_tensors: ggml ctx size =    0.28 MiB
[1707396576] llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
[1707396576] llama_load_model_from_file: failed to load model
[1707396576] main: error: unable to load model

Does that mean MiniCPM is not yet fully supported upon llama.cpp?

Feb 08 '24 13:02 Chaunice

Just found this PR merged into llama.cpp master. However, using llama.cpp b2100. Got the same error. Platform: Windows 11 Log file:

[1707396576] Log start
[1707396576] Cmd: D:\llama.cpp\main.exe -ngl 35 -m MiniCPM-2B-dpo.Q4_K_M.gguf --color -c 1024 --temp 0.3 --repeat_penalty 1.02 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{write a poem about love and death}\n\n### Response:"
[1707396576] main: build = 2098 (26d4efd1)
[1707396576] main: built with MSVC 19.37.32826.1 for x64
[1707396576] main: seed  = 1707396576
[1707396576] main: llama backend init
[1707396576] main: load the model and apply lora adapter, if any
[1707396576] llama_model_loader: loaded meta data with 22 key-value pairs and 362 tensors from MiniCPM-2B-dpo.Q4_K_M.gguf (version GGUF V3 (latest))
[1707396576] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1707396576] llama_model_loader: - kv   0:                       general.architecture str              = llama
[1707396576] llama_model_loader: - kv   1:                               general.name str              = .
[1707396576] llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
[1707396576] llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2304
[1707396576] llama_model_loader: - kv   4:                          llama.block_count u32              = 40
[1707396576] llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5760
[1707396576] llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
[1707396576] llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 36
[1707396576] llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 36
[1707396576] llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
[1707396576] llama_model_loader: - kv  10:                          general.file_type u32              = 15
[1707396576] llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
[1707396576] llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,122753]  = ["<unk>", "<s>", "</s>", "<SEP>", "<C...
[1707396576] llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,122753]  = [0.000000, 0.000000, 0.000000, 0.0000...
[1707396576] llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,122753]  = [2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[1707396576] llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
[1707396576] llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
[1707396576] llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
[1707396576] llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
[1707396576] llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
[1707396576] llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
[1707396576] llama_model_loader: - kv  21:               general.quantization_version u32              = 2
[1707396576] llama_model_loader: - type  f32:   81 tensors
[1707396576] llama_model_loader: - type q5_0:   20 tensors
[1707396576] llama_model_loader: - type q8_0:   20 tensors
[1707396576] llama_model_loader: - type q4_K:  221 tensors
[1707396576] llama_model_loader: - type q6_K:   20 tensors
[1707396576] llm_load_vocab: mismatch in special tokens definition ( 3528/122753 vs 259/122753 ).
[1707396576] llm_load_print_meta: format           = GGUF V3 (latest)
[1707396576] llm_load_print_meta: arch             = llama
[1707396576] llm_load_print_meta: vocab type       = SPM
[1707396576] llm_load_print_meta: n_vocab          = 122753
[1707396576] llm_load_print_meta: n_merges         = 0
[1707396576] llm_load_print_meta: n_ctx_train      = 2048
[1707396576] llm_load_print_meta: n_embd           = 2304
[1707396576] llm_load_print_meta: n_head           = 36
[1707396576] llm_load_print_meta: n_head_kv        = 36
[1707396576] llm_load_print_meta: n_layer          = 40
[1707396576] llm_load_print_meta: n_rot            = 64
[1707396576] llm_load_print_meta: n_embd_head_k    = 64
[1707396576] llm_load_print_meta: n_embd_head_v    = 64
[1707396576] llm_load_print_meta: n_gqa            = 1
[1707396576] llm_load_print_meta: n_embd_k_gqa     = 2304
[1707396576] llm_load_print_meta: n_embd_v_gqa     = 2304
[1707396576] llm_load_print_meta: f_norm_eps       = 0.0e+00
[1707396576] llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
[1707396576] llm_load_print_meta: f_clamp_kqv      = 0.0e+00
[1707396576] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1707396576] llm_load_print_meta: n_ff             = 5760
[1707396576] llm_load_print_meta: n_expert         = 0
[1707396576] llm_load_print_meta: n_expert_used    = 0
[1707396576] llm_load_print_meta: rope scaling     = linear
[1707396576] llm_load_print_meta: freq_base_train  = 10000.0
[1707396576] llm_load_print_meta: freq_scale_train = 1
[1707396576] llm_load_print_meta: n_yarn_orig_ctx  = 2048
[1707396576] llm_load_print_meta: rope_finetuned   = unknown
[1707396576] llm_load_print_meta: model type       = 13B
[1707396576] llm_load_print_meta: model ftype      = Q4_K - Medium
[1707396576] llm_load_print_meta: model params     = 2.72 B
[1707396576] llm_load_print_meta: model size       = 1.61 GiB (5.07 BPW) 
[1707396576] llm_load_print_meta: general.name     = .
[1707396576] llm_load_print_meta: BOS token        = 1 '<s>'
[1707396576] llm_load_print_meta: EOS token        = 2 '</s>'
[1707396576] llm_load_print_meta: UNK token        = 0 '<unk>'
[1707396576] llm_load_print_meta: LF token         = 1099 '<0x0A>'
[1707396576] llm_load_tensors: ggml ctx size =    0.28 MiB
[1707396576] llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
[1707396576] llama_load_model_from_file: failed to load model
[1707396576] main: error: unable to load model

Does that mean MiniCPM is not yet fully supported upon llama.cpp?

May I ask how you obtained the MiniCPM-2B-dpo.Q4_K_M.gguf? Could you please try converting it from the original huggingface model using the latest code from the llama.cpp master branch?

Feb 08 '24 14:02 runfuture

Maybe LM studio doesn't update to the latest version of Llama.cpp. Be patient and wait for some time😄

Feb 09 '24 00:02 sweetcard

@Chaunice For convenience, I have prepared a Colab notebook to convert the model to GGUF. Additionally, I have provided the converted GGUF models in the links below:

Feb 09 '24 02:02 runfuture

LM Studio is reporting - "llama.cpp error: 'unknown model architecture: 'minicpm'" with gguf you have provided.

Feb 09 '24 04:02 sungkim11

@Chaunice For convenience, I have prepared a Colab notebook to convert the model to GGUF. Additionally, I have provided the converted GGUF models in the links below:

MiniCPM-2B-dpo-q4km-gguf

MiniCPM-2B-dpo-fp16-gguf

Hi, apologies for the delayed response. I've just tested the first model you shared, and it's working perfectly! I suppose the previous error might have been caused by an incompatible version of the llama.cpp, as the model I initially used is from lastrosade/MiniCPM-2B-dpo-f32-gguf. Anyway, thx for your contribution. Very helpful!

Feb 09 '24 06:02 Chaunice

No. It doesn't work even after updating the LM Studio in the latest version.

The architecture of the model should be Llama rather minicpm. It generates error.

@Chaunice For convenience, I have prepared a Colab notebook to convert the model to GGUF. Additionally, I have provided the converted GGUF models in the links below:

MiniCPM-2B-dpo-q4km-gguf

MiniCPM-2B-dpo-fp16-gguf

Hi, apologies for the delayed response. I've just tested the first model you shared, and it's working perfectly! I suppose the previous error might have been caused by an incompatible version of the llama.cpp, as the model I initially used is from lastrosade/MiniCPM-2B-dpo-f32-gguf. Anyway, thx for your contribution. Very helpful!

Feb 09 '24 06:02 dr-data

No. It doesn't work even after updating the LM Studio in the latest version.

The architecture of the model should be Llama rather minicpm. It generates error.

@Chaunice For convenience, I have prepared a Colab notebook to convert the model to GGUF. Additionally, I have provided the converted GGUF models in the links below:

MiniCPM-2B-dpo-q4km-gguf

MiniCPM-2B-dpo-fp16-gguf

Hi, apologies for the delayed response. I've just tested the first model you shared, and it's working perfectly! I suppose the previous error might have been caused by an incompatible version of the llama.cpp, as the model I initially used is from lastrosade/MiniCPM-2B-dpo-f32-gguf. Anyway, thx for your contribution. Very helpful!

same problem

Feb 12 '24 04:02 jackylee1

I tried the latest version of llama.cpp and it can run normally. I feel like I can only wait for updates. After all, LM Studio is not open source software.

Feb 12 '24 06:02 cxzx150133

After my testing, LM Studio 0.2.16 can already run the gguf version of MiniCPM normally. Although Unsupported Architecture is displayed, it does not affect normal use.

Feb 23 '24 06:02 cxzx150133

MiniCPM MiniCPM copied to clipboard

[Bad Case]: 在LM Studio 用不了

Description / 描述

Case Explaination / 案例解释

MiniCPM
MiniCPM copied to clipboard