text-embeddings-inference Feature Request: Multi-GPU inference or the ability to choose a GPU at startup

Feature Request: Multi-GPU inference or the ability to choose a GPU at startup

Open dangerzone opened this issue 1 year ago • 1 comments

Feature request

Hello,

Thank you for releasing this inference server!

I have two requests, either of which would solve my specific problem:

Ability to specify which GPU to use when starting the TEI server
Alternatively ability to use all/N GPUs with TEI server load balancing traffic to them

Motivation

Currently, TEI can only support running inference on a single GPU. The advice I found in another issue here was to spin up multiple docker containers and assign the GPU to use.

In some environments such as P2P GPU services (i.e vastai) the compute resource is a docker container without access to the host itself, I'm unable to spin up multiple containers to make use of multiple GPUs.

When starting multiple instances of TEI, they all use the first GPU. Adding a CLI argument to specify the GPU id/index would solve this issue.

An alternative would be to use All/N GPUs as specified via a CLI flag, and TEI itself would handle load balancing amongst them.

Your contribution

Moral support

Sep 18 '24 11:09 dangerzone

text-embeddings-inference text-embeddings-inference copied to clipboard

Feature Request: Multi-GPU inference or the ability to choose a GPU at startup

Feature request

Motivation

Your contribution

text-embeddings-inference
text-embeddings-inference copied to clipboard