No response to requests when using Multi-node --pp_size > 1 in a multi-node environment
System Info
- GPU Properties NVIDIA H100
- Libraries v0.20.0rc1
- Docker
Who can help?
No response
Information
- [x] The official example scripts
- [x] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/deepseek_v3#multi-node
I followed the instructions on the page above and ran the following command:
mpirun -v
-H HOST1:8,HOST2:8
-x LD_LIBRARY_PATH=/usr/local/tensorrt/targets/x86_64-linux-gnu/lib:$LD_LIBRARY_PATH
-x NCCL_IB_DISABLE=0
-x NCCL_IBEXT_DISABLE=0
-x NCCL_NET_GDR_LEVEL=1
-x NCCL_DEBUG=INFO
-x NCCL_IB_HCA=mlx5_0:1,mlx5_3:1,mlx5_6:1,mlx5_7:1
-x NCCL_SOCKET_IFNAME=eth0
-mca plm_rsh_args "-p 2233"
--allow-run-as-root
-n 16
trtllm-llmapi-launch trtllm-serve /logs/DeepSeek-R1 --backend pytorch --tp_size 8 --pp_size 2 --gpus_per_node 8 --extra_llm_api_options /logs/trtllm-serve/extra-llm-api-config.yml
It works well when using tp=16, but when setting pp = 2, the model loads successfully, yet it fails to handle incoming requests.
- Model loads successfully
- /v1/models returns metadata
- /v1/chat/completions request hangs with no response
- No error messages in logs, server remains idle
Expected behavior
The request was successfully processed.
actual behavior
The request failed to be processed.
additional notes
na