[Bug] "ValueError: too many values to unpack" when trying to run Qwen3-30B-A3B with v0.3 ktransformers
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed.
- [x] 5. To help the community, I will use Chinese/English or attach an Chinese/English translation if using another language. Non-Chinese/English content without translation may be closed.
Describe the bug
First run I got AttributeError: 'Qwen3MoeForCausalLM' object has no attribute '_get_logits_warper'. Did you mean: '_get_logits_processor'? like https://github.com/kvcache-ai/ktransformers/issues/1238, so I tried the fix @vickiegpt provided and @createthis provided, then got another error:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/local_chat.py", line 187, in <module>
fire.Fire(local_chat)
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/local_chat.py", line 181, in local_chat
generated = prefill_and_generate(
^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/util/utils.py", line 225, in prefill_and_generate
logits = chunk_prefill(inputs[:, chunk_start:chunk_end], cache_position[chunk_start:chunk_end], past_key_values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/util/utils.py", line 182, in chunk_prefill
logits = model(
^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/models/modeling_qwen3_moe.py", line 1172, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/operators/models.py", line 363, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/models/modeling_qwen3_moe.py", line 383, in forward
hidden_states = self.input_layernorm(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/miniconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/v-zhaoxiang/zhangyiwei/kTransformers/ktransformers/ktransformers/operators/layernorm.py", line 144, in forward
bsz, hidden_size = x.shape
^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)
Reproduction
Qwen3: https://huggingface.co/lmstudio-community/Qwen3-30B-A3B-GGUF/resolve/main/Qwen3-30B-A3B-Q4_K_M.gguf
ktransformers version: 753049
Modifications made are as follows:
diff --git a/ktransformers/local_chat.py b/ktransformers/local_chat.py
index 928de48..973c476 100644
--- a/ktransformers/local_chat.py
+++ b/ktransformers/local_chat.py
@@ -25,6 +25,7 @@ import fire
from ktransformers.optimize.optimize import optimize_and_load_gguf
from ktransformers.models.modeling_deepseek import DeepseekV2ForCausalLM
from ktransformers.models.modeling_qwen2_moe import Qwen2MoeForCausalLM
+from ktransformers.models.modeling_qwen3_moe import Qwen3MoeForCausalLM
from ktransformers.models.modeling_deepseek_v3 import DeepseekV3ForCausalLM
from ktransformers.models.modeling_llama import LlamaForCausalLM
from ktransformers.models.modeling_mixtral import MixtralForCausalLM
@@ -37,6 +38,7 @@ custom_models = {
"DeepseekV2ForCausalLM": DeepseekV2ForCausalLM,
"DeepseekV3ForCausalLM": DeepseekV3ForCausalLM,
"Qwen2MoeForCausalLM": Qwen2MoeForCausalLM,
+ "Qwen3MoeForCausalLM": Qwen3MoeForCausalLM,
"LlamaForCausalLM": LlamaForCausalLM,
"MixtralForCausalLM": MixtralForCausalLM,
}
diff --git a/ktransformers/models/modeling_qwen3_moe.py b/ktransformers/models/modeling_qwen3_moe.py
index 175f88c..0bec253 100644
--- a/ktransformers/models/modeling_qwen3_moe.py
+++ b/ktransformers/models/modeling_qwen3_moe.py
@@ -1049,6 +1049,46 @@ class Qwen3MoeForCausalLM(Qwen3MoePreTrainedModel, GenerationMixin):
# Initialize weights and apply final processing
self.post_init()
+ def _get_logits_warper(self, generation_config, device=None):
+ from transformers.generation.logits_process import (
+ LogitsProcessorList,
+ TemperatureLogitsWarper,
+ TopPLogitsWarper,
+ TopKLogitsWarper,
+ MinPLogitsWarper,
+ TypicalLogitsWarper,
+ )
+
+ # Initialize with an empty processor list
+ warpers = LogitsProcessorList()
+
+ # Add temperature warper if applicable
+ temperature = generation_config.temperature
+ if temperature is not None and temperature != 1.0:
+ warpers.append(TemperatureLogitsWarper(temperature))
+
+ # Add top_p warper if applicable
+ top_p = generation_config.top_p
+ if top_p is not None and top_p < 1.0:
+ warpers.append(TopPLogitsWarper(top_p))
+
+ # Add top_k warper if applicable
+ top_k = generation_config.top_k
+ if top_k is not None and top_k != 0:
+ warpers.append(TopKLogitsWarper(top_k))
+
+ # Add min_p warper if applicable
+ min_p = getattr(generation_config, "min_p", None)
+ if min_p is not None and min_p > 0.0:
+ warpers.append(MinPLogitsWarper(min_p))
+
+ # Add typical_p warper if applicable
+ typical_p = getattr(generation_config, "typical_p", None)
+ if typical_p is not None and 0 < typical_p < 1.0:
+ warpers.append(TypicalLogitsWarper(typical_p))
+
+ return warpers
+
def get_input_embeddings(self):
return self.model.embed_tokens
python -m ktransformers.local_chat --architectures Qwen3MoeForCausalLM --model_path Qwen/Qwen3-30B-A3B --gguf_path ./Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-Q4_K_M.gguf --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml --backend_type balance_serce
Environment
OS: Ubuntu 22.04.1 Hardware: AMD EPYC 7V13, A100 80GB Software: python 3.11, pytorch 2.7.0, cuda 12.8, cmake 3.29.6
I'm not a member of the ktransformers team, just a user, but I'm pretty sure Qwen3 was never intended to work with the ktransformers backend. It works with the balance_serve backend though.
I'm not a member of the ktransformers team, just a user, but I'm pretty sure Qwen3 was never intended to work with the ktransformers backend. It works with the balance_serve backend though.
Thanks, but after making the above modifications and reinstalling ktransformers, I still encountered the same ValueError when using the balance_serve backend. Have you ever encountered this issue?
@zondie17 have you tried reverting your patch? It shouldn't be necessary for balance_serve. I patched ktransformers/models/modeling_deepseek_v3.py to run deepseek-v3 under the ktransformers backend, but I didn't have to patch ktransformers/models/modeling_qwen3_moe.py.
It's been a while since I was using that version, so I'm going off memory and my patch notes.
@zondie17 have you tried reverting your patch? It shouldn't be necessary for
balance_serve. I patchedktransformers/models/modeling_deepseek_v3.pyto run deepseek-v3 under thektransformersbackend, but I didn't have to patchktransformers/models/modeling_qwen3_moe.py.It's been a while since I was using that version, so I'm going off memory and my patch notes.
Thanks, may I ask which version you are using now?
Thanks, may I ask which version you are using now?
I'm not using Qwen3. I tried it with balance_serve, and it was ok, but it lacked long context, which I need for agentic purposes. I also tried it with llama.cpp and long context. It didn't work well though. It just kept looping on a file I told it to read.
I'm currently running Deepseek-V3-0324:671b-Q4_K_M with ktransformers using the ktransformers backend. I have it checked out from main at commit 8dc1ab right now, which is somewhere between 0.2.4-post1 and 0.3.
It's not perfect. It crashes a lot. But it gives me long context and it's pretty fast on my machine and it follows instructions really well, so I like it.
It‘s probably that you are using KQwen3MoeRMSNorm in the optimiaze rules. I think there are some differents in the interfaces for different methods, you can try modify it with DeepseekV3RMSNormTorch in the optimize rules yaml file.
我的ktransformers版本是v0.3.1,不论加不加--backend_type balance_serce,也有ValueError: too many values to unpack (expected 2)的报错。后续你用这个local_chat方式成功运行qwen3-moe模型了吗?