fastertransformer_backend
fastertransformer_backend copied to clipboard

Published 20 hours ago •

triton-inference-server

Reame
Issues

How to deploy multiple model in a node with multople GPUs

Open jjjjohnson opened this issue 2 years ago • 0 comments

Description

Suppose I have 5 GPT models with each TP=2 and I want to deploy them in a machine with 8 GPUs.  Is it possible? If so, how to control the GPU allocation? I tried to set CUDA_VISIBLE_DEVICES when launch the Triton server does not work.

Reproduced Steps

Tried CUDA_VISIBLE_DEVICES

Sep 14 '23 06:09 jjjjohnson