TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Request for Reproduction Configuration of DeepSeek-R1 on H200 & B200

Open xwuShirley opened this issue 9 months ago • 5 comments

Hi @kaiyux,

We're curious about the details in this blog post: https://developer.nvidia.com/blog/nvidia-blackwell-delivers-world-record-deepseek-r1-inference-performance/

Specifically, could you share the configuration used to reproduce the results shown in the image below for H200 and B200?

Image

Really appreciate the incredible work!

Best, Shirley

xwuShirley avatar Mar 20 '25 21:03 xwuShirley

trtllm-serve  nvidia/DeepSeek-R1-FP4 \
--max_batch_size 256 --max_num_tokens 32768 \
--max_seq_len 32768 --kv_cache_free_gpu_memory_fraction 0.95 \
--host 0.0.0.0 --port 30001 --trust_remote_code --backend pytorch --tp_size 8 --ep_size 8

It seems for B200, the above did not give us such 253 TPS

xwuShirley avatar Mar 20 '25 21:03 xwuShirley

@Edwardf0t1 You upload the model weight https://huggingface.co/nvidia/DeepSeek-R1-FP4/tree/main. May you know about the deployment configuration for trtllm-serve? Thank you :)

xwuShirley avatar Mar 20 '25 21:03 xwuShirley

@kaiyux @Kefeng-Duan for vis on this question from the community. @laikhtewari for vis also.

June

juney-nvidia avatar Mar 24 '25 23:03 juney-nvidia

Hi @xwuShirley, thanks for your attention. There are some changes we haven't update to the main branch yet, we will keep you posted.

kaiyux avatar Mar 25 '25 12:03 kaiyux

trtllm-serve  nvidia/DeepSeek-R1-FP4 \
--max_batch_size 256 --max_num_tokens 32768 \
--max_seq_len 32768 --kv_cache_free_gpu_memory_fraction 0.95 \
--host 0.0.0.0 --port 30001 --trust_remote_code --backend pytorch --tp_size 8 --ep_size 8

It seems for B200, the above did not give us such 253 TPS

Pls refer this from another community member:

  • https://github.com/NVIDIA/TensorRT-LLM/issues/3058#issuecomment-2753688626

juney-nvidia avatar Mar 26 '25 14:03 juney-nvidia

Closing based on https://github.com/NVIDIA/TensorRT-LLM/issues/2964#issuecomment-2754585600. Feel free to reopen 👍

poweiw avatar Aug 12 '25 22:08 poweiw