FastChat
FastChat copied to clipboard
Question on Continuous Batching Feature in VLLM Worker
Hello,
I am reaching out for clarification on the continuous batching mechanism within the VLLM worker, as described in the VLLM Integration README. The documentation states that the system "offers advanced continuous batching and a much higher (~10x) throughput."
I have successfully initiated a VLLM model worker and would like to confirm whether this continuous batching feature is enabled by default. Additionally, I am interested in understanding how the VLLM worker handles high concurrency of requests. Specifically, does it automatically batch incoming requests and provide responses individually as anticipated?