tutel [Question] How can I set the max tokens with tutelgroup/deepseek-671b

Hi,

I ran the script to deploy kimi-k2 model using tutelgroup/deepseek-671b. However, the API request with the max_tokens parameter does not seem to work as expected. Is this parameter supported?

Jul 15 '25 17:07 heraclex12

The max_tokens isn't handled yet through REST JSON API. Instead, it is currently a static global setting, which is specified by the argument --max_seq_len? (the version 20250715 has a fine-grain max_tokens optimization over cudaGraph.) e.g.:

docker run -it --rm --ipc=host --net=host --shm-size=8g --ulimit memlock=-1 \
      --ulimit stack=67108864 --gpus=all -v /:/host -w /host$(pwd) \
      tutelgroup/deepseek-671b:a100x8-chat-20250715 \
        --try_path ./moonshotai/Kimi-K2-Instruct \
        --serve --listen_port 8000 \
        --prompt "Calculate the indefinite integral of 1/sin(x) + x" \
        --max_seq_len 4200

Jul 15 '25 23:07 ghostplant

But it is a good feature request and we'll handle those settings through REST API in the future.

Jul 15 '25 23:07 ghostplant

Thank you. Looking forward to your next update!

Jul 16 '25 05:07 heraclex12

The new version of tutelgroup/deepseek-671b:mi300x8-chat-20250723 allows setting max_tokens in REST:

curl -X POST http://0.0.0.0:8000/chat -d '{"messages": [{"role": "user", "content": "Hello."}], "max_tokens": 256 }'

Jul 24 '25 06:07 ghostplant