text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

How to create a NCCL group on Kubernetes?

Open rsaxena-rajat opened this issue 2 years ago • 1 comments

I am deploying text-generation-inference on EKS with each node having 1 NVIDIA A10G GPU.

How should I create a group such that a model like llama-2-13b-chat is able to use GPUs across nodes for inference?

rsaxena-rajat avatar Aug 10 '23 09:08 rsaxena-rajat

You would need to change the source code to use a network socket for NCCL.

However, why not deploy on 4xA10G instead? Latency is likely to be much better. We never deployed with NCCL over network, because network is going to kill performance almost surely.

Narsil avatar Aug 10 '23 09:08 Narsil

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Apr 12 '24 01:04 github-actions[bot]