ray_vllm_inference
ray_vllm_inference copied to clipboard
Can I run a model service on multiple GPUs?
I want to run a model service on multiple GPUs without TP。