lm-evaluation-harness Fixed a timeout issue in `api_models.py` when using async requests

The original implementation of the TemplateAPI class used a single ClientSession for all asynchronous API requests. Which will result in a Timeout and cancellation of all requests after the default 300 seconds.

Here is the example which would lead to the timeout:

lm_eval
--model local-completions
--tasks mmlu_pro_math
--model_args model=Qwen2.5-72B-Instruct,base_url=http://localhost:8000/v1/completions,num_concurrent=16,max_retries=3,tokenizer=Qwen/Qwen2.5-72B-Instruct

here the use of num_concurrent > 1 will result in use of the async requests.

Solution To address these issues, the following changes were made:

Semaphore for Concurrency Control:

A semaphore is used to limit the number of concurrent requests. This ensures that the number of simultaneous requests does not exceed the specified limit, preventing the API from being overwhelmed and reducing the likelihood of timeouts.
Simplified ClientSession Initialization: Each amodel_call now creates and disposes of its own ClientSession. This means that each request has its own session, which helps to isolate the effects of a single request's failure. By removing the TCPConnector and managing the ClientSession within each amodel_call, the code becomes cleaner and more straightforward.
Updated get_batched_requests Method: The get_batched_requests method creates a new task for each batch of requests, and each task manages its own ClientSession within the amodel_call method. This approach ensures that each batch of requests is handled independently, reducing the risk of a single failed request affecting the entire batch.

Hence the timeout is set for all requests in the test.

Mar 14 '25 18:03 dazipe

All committers have signed the CLA.

Mar 14 '25 18:03 CLAassistant

Hi! thanks for the PR, but my understanding is that if one request times out, only that specific request is canceled, even if you are sharing a ClientSession. You can also pass a custom timeout to model_args.

Mar 28 '25 12:03 baberabb

Hello,

No, once single request times out all requests in the pool get cancelled... At least in my environment. I stumbled upon this while using the asynchronous requests ( num_concurrent > 1) Not only that but it seems like the timeout is not reset after connection is reused. I'm not exactly sure. I played with the TCPConnector init parameters but nothing helped. Please test it yourself. Just set timeout in to something small and try to mmlu_pro test.

Mar 28 '25 19:03 dazipe

lm-evaluation-harness lm-evaluation-harness copied to clipboard

Fixed a timeout issue in `api_models.py` when using async requests

lm-evaluation-harness
lm-evaluation-harness copied to clipboard