LitServe Feat: Evict requests if the client has disconnected

Before submitting

[x] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
[x] Did you read the contributor guideline, Pull Request section?
[ ] Did you make sure to update the docs?
[x] Did you write any new necessary tests?

What does this PR do?

Fixes #165.

This PR addresses the situation when client requests are disconnected before completion. It tracks canceled requests and stops ongoing/running tasks, thereby saving computational resources.

Approach

The solution involves checking whether the client has disconnected during both predict and stream predict operations. If a disconnection is detected, the system adds the corresponding request ID to the request_evicted_status multiprocessing dictionary.

This dictionary is then monitored by the running loops (both streaming and non-streaming) in worker process. If the worker loop detects that the current running request ID is present in the request_evicted_status, it immediately terminates the ongoing task associated with that request.

Potential Impacts

This approach might impact performance due to the additional overhead of monitoring and terminating tasks, which could introduce minor delays in processing. Benchmarking

TODO

[x] Handle client request disconnection in streaming mode
[ ] Handle client request disconnection in non-streaming mode (In progress)
- Investigating methods to terminate tasks from the worker process (run_single_loop).
[ ] Add/Improve tests (In progress)
- Exploring ways to verify task termination from the worker process
[ ] Handle client disconnection for batched requests

Any help or guidance on this PR would be greatly appreciated 🙏. Thank you! 😊

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Aug 21 '24 04:08 bhimrazy

looking good @bhimrazy so far! We might have to run some benchmarks to verify that we don't lose performance because of multiprocessing synchronization. But really good approach 😄

Aug 21 '24 15:08 aniketmaurya

looking good @bhimrazy so far! We might have to run some benchmarks to verify that we don't lose performance because of multiprocessing synchronization. But really good approach 😄

Thanks, @aniketmaurya! for the feedback. I'm currently exploring ways to test this feature and will keep you updated on the progress.

Aug 21 '24 16:08 bhimrazy

@bhimrazy I'd suggest to go ahead with this technique and maybe implement is for single non batched loop then we can run some tests. and if everything goes well then we can implement it for other loops too.

Aug 21 '24 16:08 aniketmaurya

Sure, that sounds great!

Aug 21 '24 17:08 bhimrazy

Codecov Report

Attention: Patch coverage is 77.77778% with 12 lines in your changes missing coverage. Please review.

Project coverage is 81%. Comparing base (a65fadf) to head (6ffe51c). Report is 9 commits behind head on main.

Additional details and impacted files

@@         Coverage Diff         @@
##           main   #208   +/-   ##
===================================
- Coverage    82%    81%   -0%     
===================================
  Files        13     13           
  Lines      1048   1084   +36     
===================================
+ Hits        855    881   +26     
- Misses      193    203   +10

Aug 21 '24 18:08 codecov[bot]

Hi @aniketmaurya, I am looking for abit of guidance for cancelling the task which already started in run_single_loop (non streaming case).

specifically this part from run_single_loop

  x = _inject_context(
      context,
      lit_api.decode_request,
      x_enc,
  )
  y = _inject_context(
      context,
      lit_api.predict,
      x,
  )
  y_enc = _inject_context(
      context,
      lit_api.encode_response,
      y,
  )

Also, I was testing with the following approach but wanted to get your opinion on whether this is a good way to handle it:

def process_steps(...):
    # Handle all steps: decode, predict, encode
    return ...

task = asyncio.create_task(process_steps(...))

while not task.done():
    # Check if the request was cancelled from the predict function
    if cancelled:
        task.cancel()
        break
    await asyncio.sleep(...)  # Check every ...ms

y_enc = await task

I'd really appreciate your insights. Thank you!

Aug 22 '24 06:08 bhimrazy

sure @bhimrazy, I will ping you on discord

Aug 22 '24 11:08 aniketmaurya

@bhimrazy please add a proper PR description. LitServe is live now, so it needs to follow Lightning AI's guidelines for production-ready OSS software.

clear PR descriptions
tests
documentation

thanks!

Aug 23 '24 11:08 williamFalcon

Closing this PR.

Due to the complexity involved, the streaming and non-streaming cases will be handled separately in new PRs (in more better and cleaner way).

You can find the streaming case PR here: #223.

Aug 26 '24 16:08 bhimrazy