ktransformers [Feature] support qwen3-coder-480b-a35b

需求背景

qwen3-coder is a agentic code model. it`s useful to daily coding. Can it run correctly on ktransformers?

Is this speed normal?

cpu - Intel(R) Xeon(R) Platinum 8468V - 80Core
gpu - NVIDIA A100 80GB

python ktransformers/server/main.py \
  --architectures Qwen3MoeForCausalLM \
  --model_path /workspace/qwen3/models/Qwen3-Coder-480B-A35B-Instruct \
  --gguf_path /workspace/qwen3/models/qwen3-coder-480b-q4_k_m \
  --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml \
  --backend_type balance_serve \
  --port 8080

eg.

decode_batch_i: 1,
Model execution time (GPU): 299.299 ms, 3.341 tokens/s
1614
decode_batch_i: 1,
Model execution time (GPU): 220.331 ms, 4.539 tokens/s
38005
decode_batch_i: 1,
Model execution time (GPU): 222.258 ms, 4.499 tokens/s
58252
decode_batch_i: 1,
Model execution time (GPU): 216.355 ms, 4.622 tokens/s
42387
decode_batch_i: 1,
Model execution time (GPU): 391.508 ms, 2.554 tokens/s
1005
decode_batch_i: 1,
Model execution time (GPU): 266.367 ms, 3.754 tokens/s
35083
decode_batch_i: 1,
Model execution time (GPU): 216.006 ms, 4.630 tokens/s
10116
decode_batch_i: 1,
Model execution time (GPU): 212.021 ms, 4.717 tokens/s
15
decode_batch_i: 1,
Model execution time (GPU): 222.697 ms, 4.490 tokens/s
921
decode_batch_i: 1,
Model execution time (GPU): 226.164 ms, 4.422 tokens/s

Jul 25 '25 03:07 lWolvesl

Tried UD-Q8 from unsloth/Qwen3-Coder-480B-A35B-Instruct-1M-GGUF with no luck.

Jul 28 '25 00:07 igerry

[Feature] support qwen3-coder-480b-a35b

需求背景

相关资源

Is this speed normal?