vllm
vllm copied to clipboard
[Usage]: Given sufficient GPU memory, which is better: starting a single vLLM instance or starting multiple instances for load balancing?
Your current environment
A100-80G×8
How would you like to use vllm
Given sufficient GPU memory, which is better: starting a single vLLM instance or starting multiple instances for load balancing?
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.