ray_vllm_inference
ray_vllm_inference copied to clipboard
Can I run a model service on multiple GPUs?
I want to run a model service on multiple GPUs without TP。
Yes, you can set the number of replicas in the deploy config YAML file or the @serve.deployment annotation.