aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

[Misc][API] Cache and Router refactoring for concurrent performance, concurrent safety and stateful routing.

Open zhangjyr opened this issue 9 months ago • 4 comments

Pull Request Description

Refactoring for cache:

  1. Merge multiple pod, model, and metric mapping by adding Pod metadata and Model metadata and using two main thread-safe registries for metadata.
  2. Eliminate the global cache mutex lock on reading and replace it with multiple layers of locks in the metadata.
  3. Cache serves in different contexts: control manager, metadata, and gateway. Added cache initialization helpers for each context for later customization.
  4. Added more unit test cases to cover the pod and model adapter changes.

Refactoring for router

  1. Eliminate thread-unsafe map access in the Router interface.
  2. Merge two contexts, context.Context and routing.RoutingContext, as RoutingContext
  3. Abstract away the Router interface and the RoutingContext for shared access from both the routing and cache package.
  4. RoutingContext now supports both sync and async routing resolution by following the Promise paradigm.

Related Issues

Resolves: #868

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • [ ] PR title includes appropriate prefix(es)
  • [ ] Changes are clearly explained in the PR description
  • [ ] New and existing tests pass successfully
  • [ ] Code adheres to project style and best practices
  • [ ] Documentation updated to reflect changes (if applicable)
  • [ ] Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

zhangjyr avatar Mar 19 '25 21:03 zhangjyr

Some code changes might be conflict with https://github.com/vllm-project/aibrix/pull/878. that's not a problem now. We can review the core logic first and then rebase the changes later.

Jeffwan avatar Mar 19 '25 23:03 Jeffwan

I will have a check today

Jeffwan avatar Mar 21 '25 16:03 Jeffwan

/cc @Xunzhuo @varungup90 Please help review it.

@zhangjyr this is still a huge PR, if we have some agreement on the data structure and helper utilities. I suggest to add those add only code first to make this code more simpler

Jeffwan avatar Mar 24 '25 22:03 Jeffwan

I think we need to let e2e test passed before reviews

Xunzhuo avatar Mar 25 '25 08:03 Xunzhuo

@Xunzhuo jingyuan fixed the integration test issues, do you get a chance to help take a look?

Jeffwan avatar Mar 27 '25 18:03 Jeffwan

@Xunzhuo Just wanted to loop you in—this PR is currently blocking some performance-related changes (prefix-cache aware routing) that community users are waiting for. it's quite large and deserves more attention. We plan to merge it to unblock some following changes this time as an exception.

  • If you have some bandwidth, please help review this change and @zhangjyr will address those comments in future PRs
  • We do want to avoid such large PRs in the future. Big PRs like this are really tough to review properly and can slow everyone down. Contributors should split their changes into smaller, reviewable chunks to make the process smoother for everyone.

Jeffwan avatar Mar 28 '25 18:03 Jeffwan