llmperf icon indicating copy to clipboard operation
llmperf copied to clipboard

Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished

Open llsj14 opened this issue 8 months ago • 5 comments

Hello,

I've encountered an issue where the request launcher does not allow the next requests to be sent until all requests specified by num_concurrent_requests have finished.

This behavior seems counterintuitive for benchmarking TTFT and throughput in Continuous Batching systems accurately, as it can block subsequent requests even when the serving system is capable of handling them.

To address this, I believe the get_next_ready function should be modified as follows, enabling it to return results as soon as each individual request is completed:

--- a/src/llmperf/requests_launcher.py
+++ b/src/llmperf/requests_launcher.py
@@ -40,6 +40,7 @@ class RequestsLauncher:
         if not block:
             while self._llm_client_pool.has_next():
                 results.append(self._llm_client_pool.get_next_unordered())
+                return results
         else:
             while not self._llm_client_pool.has_next():
                 pass

I am prepared to submit a pull request with this change and would appreciate your feedback.

Thank you.

llsj14 avatar Jun 18 '24 16:06 llsj14