FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Question on Continuous Batching Feature in VLLM Worker

Open un-certainty opened this issue 1 year ago • 0 comments

Hello,

I am reaching out for clarification on the continuous batching mechanism within the VLLM worker, as described in the VLLM Integration README. The documentation states that the system "offers advanced continuous batching and a much higher (~10x) throughput."

I have successfully initiated a VLLM model worker and would like to confirm whether this continuous batching feature is enabled by default. Additionally, I am interested in understanding how the VLLM worker handles high concurrency of requests. Specifically, does it automatically batch incoming requests and provide responses individually as anticipated?

un-certainty avatar Apr 23 '24 08:04 un-certainty