No response to requests when using Multi-node --pp_size > 1 in a multi-node environment

Open Archmilio opened this issue 7 months ago • 0 comments

System Info

GPU Properties NVIDIA H100
Libraries v0.20.0rc1
Docker

Who can help?

No response

Information

[x] The official example scripts
[x] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/deepseek_v3#multi-node

I followed the instructions on the page above and ran the following command:

mpirun -v
-H HOST1:8,HOST2:8
-x LD_LIBRARY_PATH=/usr/local/tensorrt/targets/x86_64-linux-gnu/lib:$LD_LIBRARY_PATH
-x NCCL_IB_DISABLE=0
-x NCCL_IBEXT_DISABLE=0
-x NCCL_NET_GDR_LEVEL=1
-x NCCL_DEBUG=INFO
-x NCCL_IB_HCA=mlx5_0:1,mlx5_3:1,mlx5_6:1,mlx5_7:1
-x NCCL_SOCKET_IFNAME=eth0
-mca plm_rsh_args "-p 2233"
--allow-run-as-root
-n 16
trtllm-llmapi-launch trtllm-serve /logs/DeepSeek-R1 --backend pytorch --tp_size 8 --pp_size 2 --gpus_per_node 8 --extra_llm_api_options /logs/trtllm-serve/extra-llm-api-config.yml

It works well when using tp=16, but when setting pp = 2, the model loads successfully, yet it fails to handle incoming requests.

Model loads successfully
/v1/models returns metadata
/v1/chat/completions request hangs with no response
- No error messages in logs, server remains idle

Expected behavior

The request was successfully processed.

actual behavior

The request failed to be processed.

additional notes

May 09 '25 07:05 Archmilio