ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

[Feature] support qwen3-coder-480b-a35b

Open lWolvesl opened this issue 5 months ago • 2 comments

需求背景

qwen3-coder is a agentic code model. it`s useful to daily coding. Can it run correctly on ktransformers?

相关资源

Github huggingface

lWolvesl avatar Jul 24 '25 14:07 lWolvesl

Is this speed normal?

  • cpu - Intel(R) Xeon(R) Platinum 8468V - 80Core
  • gpu - NVIDIA A100 80GB
python ktransformers/server/main.py \
  --architectures Qwen3MoeForCausalLM \
  --model_path /workspace/qwen3/models/Qwen3-Coder-480B-A35B-Instruct \
  --gguf_path /workspace/qwen3/models/qwen3-coder-480b-q4_k_m \
  --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml \
  --backend_type balance_serve \
  --port 8080

eg.

decode_batch_i: 1,
Model execution time (GPU): 299.299 ms, 3.341 tokens/s
1614
decode_batch_i: 1,
Model execution time (GPU): 220.331 ms, 4.539 tokens/s
38005
decode_batch_i: 1,
Model execution time (GPU): 222.258 ms, 4.499 tokens/s
58252
decode_batch_i: 1,
Model execution time (GPU): 216.355 ms, 4.622 tokens/s
42387
decode_batch_i: 1,
Model execution time (GPU): 391.508 ms, 2.554 tokens/s
1005
decode_batch_i: 1,
Model execution time (GPU): 266.367 ms, 3.754 tokens/s
35083
decode_batch_i: 1,
Model execution time (GPU): 216.006 ms, 4.630 tokens/s
10116
decode_batch_i: 1,
Model execution time (GPU): 212.021 ms, 4.717 tokens/s
15
decode_batch_i: 1,
Model execution time (GPU): 222.697 ms, 4.490 tokens/s
921
decode_batch_i: 1,
Model execution time (GPU): 226.164 ms, 4.422 tokens/s

lWolvesl avatar Jul 25 '25 03:07 lWolvesl