igerry
igerry
It supports it already since 0.3.1, the latest 0.3.2 works without issue. python ktransformers/server/main.py \ --port 8080 \ --architectures Qwen3MoeForCausalLM \ --model_name Qwen3-235B-A22B-Instruct-2507 \ --model_path "/mnt/shared/models/Qwen3-235B-A22B-Instruct-2507-GGUF" \ --gguf_path "/mnt/shared/models/Qwen3-235B-A22B-Instruct-2507-GGUF/Q8_0" \...
> > Please DO NOT ADD --**cache_lens** > > If i do not specify cache_lens then i am restricted to 16k length. how do i specify 256k context length? Yes,...
Tried [UD-Q8 from unsloth/Qwen3-Coder-480B-A35B-Instruct-1M-GGUF](https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-1M-GGUF/tree/main/UD-Q8_K_XL) with no luck.
不够的,也不建议。300GB可以。
python ktransformers/server/main.py \ --port 8080 \ --architectures Qwen3MoeForCausalLM \ --model_name Qwen3-235B-A22B-Instruct-2507 \ --model_path "/mnt/shared/models/Qwen3-235B-A22B-Instruct-2507-GGUF" \ --gguf_path "/mnt/shared/models/Qwen3-235B-A22B-Instruct-2507-GGUF/Q8_0" \ --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml \ --cpu_infer 32 \ --temperature 0.7 \ --top_p 0.8 \...