TensorRT-LLM
TensorRT-LLM copied to clipboard
fix: correct cudaSetDevice error when GPUs per node are fewer than their ranks in inter-node inference
#1494
I am not in favor of having function parameter defaults that change depending on the environment. These should be compile time constants. I suggest changing run.py instead so that it passes the correct number of devices per node into the bindings.