ray_vllm_inference Can I run a model service on multiple GPUs？

Can I run a model service on multiple GPUs？

Open zrl4836 opened this issue 2 years ago • 1 comments

I want to run a model service on multiple GPUs without TP。

Nov 29 '23 08:11 zrl4836

Yes, you can set the number of replicas in the deploy config YAML file or the @serve.deployment annotation.

Nov 30 '23 00:11 asprenger