ktransformers
ktransformers copied to clipboard
[Feature] support qwen3-coder-480b-a35b
需求背景
qwen3-coder is a agentic code model. it`s useful to daily coding. Can it run correctly on ktransformers?
相关资源
Is this speed normal?
cpu-Intel(R) Xeon(R) Platinum 8468V- 80Coregpu-NVIDIA A100 80GB
python ktransformers/server/main.py \
--architectures Qwen3MoeForCausalLM \
--model_path /workspace/qwen3/models/Qwen3-Coder-480B-A35B-Instruct \
--gguf_path /workspace/qwen3/models/qwen3-coder-480b-q4_k_m \
--optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml \
--backend_type balance_serve \
--port 8080
eg.
decode_batch_i: 1,
Model execution time (GPU): 299.299 ms, 3.341 tokens/s
1614
decode_batch_i: 1,
Model execution time (GPU): 220.331 ms, 4.539 tokens/s
38005
decode_batch_i: 1,
Model execution time (GPU): 222.258 ms, 4.499 tokens/s
58252
decode_batch_i: 1,
Model execution time (GPU): 216.355 ms, 4.622 tokens/s
42387
decode_batch_i: 1,
Model execution time (GPU): 391.508 ms, 2.554 tokens/s
1005
decode_batch_i: 1,
Model execution time (GPU): 266.367 ms, 3.754 tokens/s
35083
decode_batch_i: 1,
Model execution time (GPU): 216.006 ms, 4.630 tokens/s
10116
decode_batch_i: 1,
Model execution time (GPU): 212.021 ms, 4.717 tokens/s
15
decode_batch_i: 1,
Model execution time (GPU): 222.697 ms, 4.490 tokens/s
921
decode_batch_i: 1,
Model execution time (GPU): 226.164 ms, 4.422 tokens/s
Tried UD-Q8 from unsloth/Qwen3-Coder-480B-A35B-Instruct-1M-GGUF with no luck.