Potential Memory Leak in Response Buffer
π Bug
The response_buffer dictionary could grow indefinitely if requests fail before their responses are processed.
Implement a cleanup mechanism or timeout for orphaned entries
Code sample
Expected behavior
Environment
If you published a Studio with your bug report, we can automatically get this information. Otherwise, please describe:
- PyTorch/Jax/Tensorflow Version (e.g., 1.0):
- OS (e.g., Linux):
- How you installed PyTorch (
conda,pip, source): - Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information:
Additional context
'Could grow' Is there a way we can test this? so we can run it again after writing the potential fix Also what do you want the cleanup mechanism to look like?
"Could grow" β is there a way we can test this so we can rerun it after writing a potential fix? Also, what do you want the cleanup mechanism to look like?
Hi @haddyadnan, to test this, I think you can simulate a batch of requests (say 100 or more), and randomly cancel around 30β40% of them before the response is processed or returned. You could emulate this by introducing an artificial delay (e.g., time.sleep) on the server side and cancelling the request from the client during that delay.
After this test run, you can inspect the response_buffer β it should still hold entries for the cancelled requests, indicating a potential memory leak risk.
As for the cleanup mechanism: since the buffer ideally should only retain responses for active requests, we could track disconnected or cancelled request IDs. Then, using a non-blocking background task (e.g., a periodic coroutine) or any other mechanism , we could scan the buffer and remove entries associated with those stale request IDs.
I remember implementing something similar in an older (now closed) PR, but for a different use case.
@aniketmaurya wdyt?
thank you for elaborating here @bhimrazy! sounds like a great idea π
Hi @kumarrah2002, would you like to take a look at this? It might be an interesting one.
Will do @bhimrazy @aniketmaurya , do you think a timed expiration check would be the best approach here? We can add a timestamp field to the response buffer with an expiration of 3-5 minutes and have an asynchronous function that checks the buffer every minute. The timestamp can go here:
response_buffer: Dict[str, Union[Tuple[deque, asyncio.Event, **float**], asyncio.Event]],
Will do @bhimrazy @aniketmaurya , do you think a timed expiration check would be the best approach here? We can add a timestamp field to the response buffer with an expiration of 3-5 minutes and have an asynchronous function that checks the buffer every minute. The timestamp can go here:
response_buffer: Dict[str, Union[Tuple[deque, asyncio.Event, **float**], asyncio.Event]],
Hi @kumarrah2002, sounds good to me. Just need to make sure we donβt remove entries for clients that are still connected.
I think we can align the cleanup logic with the timeout value in LitServer β anything beyond that should be safe to treat as stale.
This likely seems to only happen in streaming cases, since for non-streaming responses, the request/response usually completes even if the client disconnects midway. I'll try to share a reproducible script sometime this or next week.