ray_vllm_inference icon indicating copy to clipboard operation
ray_vllm_inference copied to clipboard

Can I run a model service on multiple GPUs?

Open zrl4836 opened this issue 2 years ago • 1 comments

I want to run a model service on multiple GPUs without TP。

zrl4836 avatar Nov 29 '23 08:11 zrl4836

Yes, you can set the number of replicas in the deploy config YAML file or the @serve.deployment annotation.

asprenger avatar Nov 30 '23 00:11 asprenger