TensorRT-LLM
TensorRT-LLM copied to clipboard
Unnecessary assertion in cpp implementation of worldConfig.cpp
https://github.com/NVIDIA/TensorRT-LLM/blob/3d56a445e8ebf888e78be638faf6beec0a78f3c2/cpp/tensorrt_llm/runtime/worldConfig.cpp#L74
Hi,
I've run into a small bug with the CPP implementation of the runtime code. I am running multi-node inference on Llama2 with pipeline parallelism 2 and tensor parallelism 8. Each node on my system has 8 GPUs. It will not run because there is an assertion that PP size * TP size <= num_GPUs_per_node at the line number I hyperlinked above. I believe that it should be PP size * TP size <= world_size. Maybe I am misunderstanding something... Also, there seems to be no logic allowing you to specify number of GPUs per node in this code.
Hi is there any update on this?
@MartinMarciniszyn @byshiue Could you please help answer this query?
The assertion is already gone in the main branch.