TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

No response to requests when using Multi-node --pp_size > 1 in a multi-node environment

Open Archmilio opened this issue 7 months ago • 0 comments

System Info

  • GPU Properties NVIDIA H100
  • Libraries v0.20.0rc1
  • Docker

Who can help?

No response

Information

  • [x] The official example scripts
  • [x] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/deepseek_v3#multi-node

I followed the instructions on the page above and ran the following command:

mpirun -v
-H HOST1:8,HOST2:8
-x LD_LIBRARY_PATH=/usr/local/tensorrt/targets/x86_64-linux-gnu/lib:$LD_LIBRARY_PATH
-x NCCL_IB_DISABLE=0
-x NCCL_IBEXT_DISABLE=0
-x NCCL_NET_GDR_LEVEL=1
-x NCCL_DEBUG=INFO
-x NCCL_IB_HCA=mlx5_0:1,mlx5_3:1,mlx5_6:1,mlx5_7:1
-x NCCL_SOCKET_IFNAME=eth0
-mca plm_rsh_args "-p 2233"
--allow-run-as-root
-n 16
trtllm-llmapi-launch trtllm-serve /logs/DeepSeek-R1 --backend pytorch --tp_size 8 --pp_size 2 --gpus_per_node 8 --extra_llm_api_options /logs/trtllm-serve/extra-llm-api-config.yml

It works well when using tp=16, but when setting pp = 2, the model loads successfully, yet it fails to handle incoming requests.

  • Model loads successfully
  • /v1/models returns metadata
  • /v1/chat/completions request hangs with no response
    • No error messages in logs, server remains idle

Expected behavior

The request was successfully processed.

actual behavior

The request failed to be processed.

additional notes

na

Archmilio avatar May 09 '25 07:05 Archmilio