llmperf
llmperf copied to clipboard
Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished
Hello,
I've encountered an issue where the request launcher does not allow the next requests to be sent until all requests specified by num_concurrent_requests
have finished.
This behavior seems counterintuitive for benchmarking TTFT and throughput in Continuous Batching systems accurately, as it can block subsequent requests even when the serving system is capable of handling them.
To address this, I believe the get_next_ready
function should be modified as follows, enabling it to return results as soon as each individual request is completed:
--- a/src/llmperf/requests_launcher.py
+++ b/src/llmperf/requests_launcher.py
@@ -40,6 +40,7 @@ class RequestsLauncher:
if not block:
while self._llm_client_pool.has_next():
results.append(self._llm_client_pool.get_next_unordered())
+ return results
else:
while not self._llm_client_pool.has_next():
pass
I am prepared to submit a pull request with this change and would appreciate your feedback.
Thank you.