text-generation-inference How to create a NCCL group on Kubernetes?

I am deploying text-generation-inference on EKS with each node having 1 NVIDIA A10G GPU.

How should I create a group such that a model like llama-2-13b-chat is able to use GPUs across nodes for inference?

Aug 10 '23 09:08 rsaxena-rajat

You would need to change the source code to use a network socket for NCCL.

However, why not deploy on 4xA10G instead? Latency is likely to be much better. We never deployed with NCCL over network, because network is going to kill performance almost surely.

Aug 10 '23 09:08 Narsil

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Apr 12 '24 01:04 github-actions[bot]