ktransformers [Bug] "ValueError: too many values to unpack" when trying to run Qwen3-30B-A3B with v0.3 ktransformers

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[x] 2. The bug has not been fixed in the latest version.
[x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
[x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed.
[x] 5. To help the community, I will use Chinese/English or attach an Chinese/English translation if using another language. Non-Chinese/English content without translation may be closed.

Describe the bug

First run I got AttributeError: 'Qwen3MoeForCausalLM' object has no attribute '_get_logits_warper'. Did you mean: '_get_logits_processor'? like https://github.com/kvcache-ai/ktransformers/issues/1238, so I tried the fix @vickiegpt provided and @createthis provided, then got another error:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/local_chat.py", line 187, in <module>
    fire.Fire(local_chat)
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/local_chat.py", line 181, in local_chat
    generated = prefill_and_generate(
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/util/utils.py", line 225, in prefill_and_generate
    logits = chunk_prefill(inputs[:, chunk_start:chunk_end], cache_position[chunk_start:chunk_end], past_key_values)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/util/utils.py", line 182, in chunk_prefill
    logits = model(
             ^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/models/modeling_qwen3_moe.py", line 1172, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/operators/models.py", line 363, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/models/modeling_qwen3_moe.py", line 383, in forward
    hidden_states = self.input_layernorm(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/operators/layernorm.py", line 144, in forward
    bsz, hidden_size = x.shape
    ^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

Reproduction

Qwen3: https://huggingface.co/lmstudio-community/Qwen3-30B-A3B-GGUF/resolve/main/Qwen3-30B-A3B-Q4_K_M.gguf ktransformers version: 753049 Modifications made are as follows:

diff --git a/ktransformers/local_chat.py b/ktransformers/local_chat.py
index 928de48..973c476 100644
--- a/ktransformers/local_chat.py
+++ b/ktransformers/local_chat.py
@@ -25,6 +25,7 @@ import fire
 from ktransformers.optimize.optimize import optimize_and_load_gguf
 from ktransformers.models.modeling_deepseek import DeepseekV2ForCausalLM
 from ktransformers.models.modeling_qwen2_moe import Qwen2MoeForCausalLM
+from ktransformers.models.modeling_qwen3_moe import Qwen3MoeForCausalLM
 from ktransformers.models.modeling_deepseek_v3 import DeepseekV3ForCausalLM
 from ktransformers.models.modeling_llama import LlamaForCausalLM
 from ktransformers.models.modeling_mixtral import MixtralForCausalLM
@@ -37,6 +38,7 @@ custom_models = {
     "DeepseekV2ForCausalLM": DeepseekV2ForCausalLM,
     "DeepseekV3ForCausalLM": DeepseekV3ForCausalLM,
     "Qwen2MoeForCausalLM": Qwen2MoeForCausalLM,
+    "Qwen3MoeForCausalLM": Qwen3MoeForCausalLM,
     "LlamaForCausalLM": LlamaForCausalLM,
     "MixtralForCausalLM": MixtralForCausalLM,
 }
diff --git a/ktransformers/models/modeling_qwen3_moe.py b/ktransformers/models/modeling_qwen3_moe.py
index 175f88c..0bec253 100644
--- a/ktransformers/models/modeling_qwen3_moe.py
+++ b/ktransformers/models/modeling_qwen3_moe.py
@@ -1049,6 +1049,46 @@ class Qwen3MoeForCausalLM(Qwen3MoePreTrainedModel, GenerationMixin):
         # Initialize weights and apply final processing
         self.post_init()
 
+    def _get_logits_warper(self, generation_config, device=None):
+        from transformers.generation.logits_process import (
+            LogitsProcessorList,
+            TemperatureLogitsWarper,
+            TopPLogitsWarper,
+            TopKLogitsWarper,
+            MinPLogitsWarper,
+            TypicalLogitsWarper,
+        )
+        
+        # Initialize with an empty processor list
+        warpers = LogitsProcessorList()
+        
+        # Add temperature warper if applicable
+        temperature = generation_config.temperature
+        if temperature is not None and temperature != 1.0:
+            warpers.append(TemperatureLogitsWarper(temperature))
+            
+        # Add top_p warper if applicable
+        top_p = generation_config.top_p
+        if top_p is not None and top_p < 1.0:
+            warpers.append(TopPLogitsWarper(top_p))
+        
+        # Add top_k warper if applicable
+        top_k = generation_config.top_k
+        if top_k is not None and top_k != 0:
+            warpers.append(TopKLogitsWarper(top_k))
+            
+        # Add min_p warper if applicable
+        min_p = getattr(generation_config, "min_p", None)
+        if min_p is not None and min_p > 0.0:
+            warpers.append(MinPLogitsWarper(min_p))
+            
+        # Add typical_p warper if applicable
+        typical_p = getattr(generation_config, "typical_p", None)
+        if typical_p is not None and 0 < typical_p < 1.0:
+            warpers.append(TypicalLogitsWarper(typical_p))
+            
+        return warpers
+
     def get_input_embeddings(self):
         return self.model.embed_tokens

python -m ktransformers.local_chat --architectures Qwen3MoeForCausalLM --model_path Qwen/Qwen3-30B-A3B --gguf_path ./Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-Q4_K_M.gguf --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml --backend_type balance_serce

Environment

OS: Ubuntu 22.04.1 Hardware: AMD EPYC 7V13, A100 80GB Software: python 3.11, pytorch 2.7.0, cuda 12.8, cmake 3.29.6

May 06 '25 08:05 zondie17

I'm not a member of the ktransformers team, just a user, but I'm pretty sure Qwen3 was never intended to work with the ktransformers backend. It works with the balance_serve backend though.

May 06 '25 13:05 createthis

I'm not a member of the ktransformers team, just a user, but I'm pretty sure Qwen3 was never intended to work with the ktransformers backend. It works with the balance_serve backend though.

Thanks, but after making the above modifications and reinstalling ktransformers, I still encountered the same ValueError when using the balance_serve backend. Have you ever encountered this issue?

May 08 '25 02:05 zondie17

@zondie17 have you tried reverting your patch? It shouldn't be necessary for balance_serve. I patched ktransformers/models/modeling_deepseek_v3.py to run deepseek-v3 under the ktransformers backend, but I didn't have to patch ktransformers/models/modeling_qwen3_moe.py.

It's been a while since I was using that version, so I'm going off memory and my patch notes.

May 08 '25 02:05 createthis

@zondie17 have you tried reverting your patch? It shouldn't be necessary for balance_serve. I patched ktransformers/models/modeling_deepseek_v3.py to run deepseek-v3 under the ktransformers backend, but I didn't have to patch ktransformers/models/modeling_qwen3_moe.py.

It's been a while since I was using that version, so I'm going off memory and my patch notes.

Thanks, may I ask which version you are using now?

May 08 '25 03:05 zondie17

Thanks, may I ask which version you are using now?

I'm not using Qwen3. I tried it with balance_serve, and it was ok, but it lacked long context, which I need for agentic purposes. I also tried it with llama.cpp and long context. It didn't work well though. It just kept looping on a file I told it to read.

I'm currently running Deepseek-V3-0324:671b-Q4_K_M with ktransformers using the ktransformers backend. I have it checked out from main at commit 8dc1ab right now, which is somewhere between 0.2.4-post1 and 0.3.

It's not perfect. It crashes a lot. But it gives me long context and it's pretty fast on my machine and it follows instructions really well, so I like it.

May 08 '25 03:05 createthis

It‘s probably that you are using KQwen3MoeRMSNorm in the optimiaze rules. I think there are some differents in the interfaces for different methods, you can try modify it with DeepseekV3RMSNormTorch in the optimize rules yaml file.

Aug 15 '25 05:08 visuOwO

我的ktransformers版本是v0.3.1，不论加不加--backend_type balance_serce，也有ValueError: too many values to unpack (expected 2)的报错。后续你用这个local_chat方式成功运行qwen3-moe模型了吗？

Aug 29 '25 09:08 PPXGS