Feat: Evict requests if the client has disconnected
Before submitting
- [x] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
- [x] Did you read the contributor guideline, Pull Request section?
- [ ] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?
What does this PR do?
Fixes #165.
This PR addresses the situation when client requests are disconnected before completion. It tracks canceled requests and stops ongoing/running tasks, thereby saving computational resources.
Approach
The solution involves checking whether the client has disconnected during both predict and stream predict operations.
If a disconnection is detected, the system adds the corresponding request ID to the request_evicted_status multiprocessing dictionary.
This dictionary is then monitored by the running loops (both streaming and non-streaming) in worker process. If the worker loop detects that the current running request ID is present in the request_evicted_status, it immediately terminates the ongoing task associated with that request.
Potential Impacts
This approach might impact performance due to the additional overhead of monitoring and terminating tasks, which could introduce minor delays in processing. Benchmarking
TODO
- [x] Handle client request disconnection in streaming mode
- [ ] Handle client request disconnection in non-streaming mode (In progress)
- Investigating methods to terminate tasks from the worker process (
run_single_loop).
- Investigating methods to terminate tasks from the worker process (
- [ ] Add/Improve tests (In progress)
- Exploring ways to verify task termination from the worker process
- [ ] Handle client disconnection for batched requests
Any help or guidance on this PR would be greatly appreciated 🙏. Thank you! 😊
PR review
Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃
looking good @bhimrazy so far! We might have to run some benchmarks to verify that we don't lose performance because of multiprocessing synchronization. But really good approach 😄
looking good @bhimrazy so far! We might have to run some benchmarks to verify that we don't lose performance because of multiprocessing synchronization. But really good approach 😄
Thanks, @aniketmaurya! for the feedback. I'm currently exploring ways to test this feature and will keep you updated on the progress.
@bhimrazy I'd suggest to go ahead with this technique and maybe implement is for single non batched loop then we can run some tests. and if everything goes well then we can implement it for other loops too.
Sure, that sounds great!
Codecov Report
Attention: Patch coverage is 77.77778% with 12 lines in your changes missing coverage. Please review.
Project coverage is 81%. Comparing base (
a65fadf) to head (6ffe51c). Report is 9 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #208 +/- ##
===================================
- Coverage 82% 81% -0%
===================================
Files 13 13
Lines 1048 1084 +36
===================================
+ Hits 855 881 +26
- Misses 193 203 +10
Hi @aniketmaurya,
I am looking for abit of guidance for cancelling the task which already started in run_single_loop (non streaming case).
specifically this part from
run_single_loop
x = _inject_context(
context,
lit_api.decode_request,
x_enc,
)
y = _inject_context(
context,
lit_api.predict,
x,
)
y_enc = _inject_context(
context,
lit_api.encode_response,
y,
)
Also, I was testing with the following approach but wanted to get your opinion on whether this is a good way to handle it:
def process_steps(...):
# Handle all steps: decode, predict, encode
return ...
task = asyncio.create_task(process_steps(...))
while not task.done():
# Check if the request was cancelled from the predict function
if cancelled:
task.cancel()
break
await asyncio.sleep(...) # Check every ...ms
y_enc = await task
I'd really appreciate your insights. Thank you!
sure @bhimrazy, I will ping you on discord
@bhimrazy please add a proper PR description. LitServe is live now, so it needs to follow Lightning AI's guidelines for production-ready OSS software.
- clear PR descriptions
- tests
- documentation
thanks!
Closing this PR.
Due to the complexity involved, the streaming and non-streaming cases will be handled separately in new PRs (in more better and cleaner way).
You can find the streaming case PR here: #223.