text-generation-inference
text-generation-inference copied to clipboard
How to create a NCCL group on Kubernetes?
I am deploying text-generation-inference on EKS with each node having 1 NVIDIA A10G GPU.
How should I create a group such that a model like llama-2-13b-chat is able to use GPUs across nodes for inference?
You would need to change the source code to use a network socket for NCCL.
However, why not deploy on 4xA10G instead? Latency is likely to be much better. We never deployed with NCCL over network, because network is going to kill performance almost surely.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.