fastertransformer_backend icon indicating copy to clipboard operation
fastertransformer_backend copied to clipboard

How to deploy multiple model in a node with multople GPUs

Open jjjjohnson opened this issue 1 year ago • 0 comments

Description

Suppose I have 5 GPT models with each TP=2 and I want to deploy them in a machine with 8 GPUs.  Is it possible? If so, how to control the GPU allocation? I tried to set CUDA_VISIBLE_DEVICES when launch the Triton server does not work.

Reproduced Steps

Tried CUDA_VISIBLE_DEVICES

jjjjohnson avatar Sep 14 '23 06:09 jjjjohnson