Mooncake [RoadMap][Call For Contributions] Mooncake Store V3 Roadmap

Milestone 1: Core Architecture Refactor & Decoupling

This milestone focuses on foundational architectural changes to improve modularity, flexibility, and prepare for future scaling.

[ ] (TE/Store Separation): Decouple the TE (Task/Tensor Engine) and Store components into separate, independent packages.
[ ] (Client/Worker Decoupling): Decouple the dummy client from the worker to remove strong dependencies. @YiXR
- https://github.com/kvcache-ai/Mooncake/pull/1084
- https://github.com/kvcache-ai/Mooncake/pull/1122
- #1146
[ ] (Flexible Deployment): Update the Store to support various flexible deployment models, such as client-only, client + master, etc.
[ ] (Tensor-native APIs): Put/Get Tensor APIs contain TP rank and model info.
- #1127
[ ] (Layer-wise Storage): Maintaining layer info on the Mooncake side, supports stream transfer and communication overlap.

Milestone 2: Master Service Enhancements

This milestone enhances the Master component to support new storage architectures and routing logic.

[ ] (Key-based Routing): Implement new key-based routing capabilities in the Master service.
[ ] (Metadata Adaptation - Storage): Adapt the Master's metadata management to support the new multi-level storage architecture.
[ ] (Recovery) kv metadata persistency
[ ] (KVCache Awareness Interface) Exposes hit ratio for different layers.
[ ] (Metadata Adaptation - HA): Upgrade metadata schema and logic to meet new High Availability (HA) requirements.
[ ] (Multi-tenant): Support Multi-tenant with different models, users and auth keys

Milestone 3: Worker: Multi-Level Storage Architecture

This is a major epic to build the next-generation multi-level storage system within the Worker.

3.1: Abstraction & Caching
- [ ] (Storage Abstraction Layer): Design and implement the core abstraction layer for multi-level storage.
- [ ] (Cache Scheduling Interface): Design the abstract interface for cache scheduling logic.
- [x] (Eviction Logic): Implement basic data eviction logic within the new storage architecture.
  - @Vincent-Bo-ali #1028
- [ ] (LRU Cache): Implement an LRU (Least Recently Used) policy as the default cache scheduling strategy.
- [ ] (Local Client Cache): Keep a local cache for better performance.
  - #1062
3.2: Storage Backend Implementation
- [ ] (DRAM Adaptation): Adapt the storage layer for DRAM, including support for NUMA affinity.
- [ ] (SSD Adaptation): Adapt the storage layer for SSDs, enabling local external storage read/write capabilities.
  - https://github.com/kvcache-ai/Mooncake/issues/1054
- [ ] (VRAM Adaptation): Adapt the storage layer to utilize VRAM.
- [ ] (Huawei NPU Adaptation): Implement support for Huawei NPUs (H2D).
3.3: Elastic KVCache Storage
- [ ] (KVCache Migration): Move KVCache between Mooncake client.
- [ ] (Data Replica): Dynamic Replication
  - #1100

Milestone 4: Worker: Networking & Elasticity

This milestone focuses on refactoring worker communication and enabling resource elasticity.

[ ] (RPC Refactor): [Phase 1] Refactor the worker's read/write logic to replace RDMA with RPC-based communication.
[x] (Barex Transport Support): Support Alibaba barex transport in TE for Mooncake Store.
- #1045
[ ] (Resource Elasticity): Implement single-worker resource elasticity.
[ ] (Event‑driven completion): Provide an option to using event-driven notification worker instead of busy-polling.
- #1033
- #1053
[ ] (IPv6 Support): Support IPv6 in client, master and metadata server.
- #1043
- #1067

Milestone 5: Deployment & Operations

This milestone covers K8s integration (i.e., RBG, https://github.com/sgl-project/rbg) and build process improvements.

[ ] (K8s Autoscaling): Implement support for Kubernetes-based autoscaling of worker and dummy client instances.
[ ] (Scenario-based Builds): Implement a build system capable of producing different worker binaries optimized for different scenarios.
[ ] (Integration With AI Configurator): Use AI Configurator for better measuring Resource workers and other configurations.
[ ] (Deployment Documentation & Guides): Create comprehensive, up-to-date deployment documentation and step-by-step setup guides to simplify installation and configuration for all environments.

Milestone 6: Conductor

This feature implements a standalone or co-located conductor for use by the Gateway. @zhongzhouTan-coder @yejj710 @Liziqi-77 @Keithwwa @Asher-XunZhang

[ ] (KV Event): support Mooncake kv event publish & management
[ ] (KV Metrics): support max cache hit compute
[ ] (Cache aware route): cache aware route
[ ] (Controller): conductor reverse proxy and P/D disaggregated control

Milestone 7: Pytorch Eco-system

[ ] (Support Tensor Attributions)
[ ] (Native torch format offload)

Milestone 8: CI & CD enhancement

[ ] (End-to-end CI tests): For SGLang, support Hicache, PD, Elatics EP, checkpoint engine tests.

Milestone 9: Performance & Benchmarks

[ ] (Store Master Benchmark): Design and integrate a dedicated benchmark for the Mooncake store master module to evaluate throughput, latency, and scalability.

Thanks for being a part of the Mooncake community! Welcome to discuss and contribute!

If you have any ideas, just leave a comment below and help shape the Roadmap.

Nov 07 '25 17:11 stmatengss

Celebrate! Finally, we have the V3 Roadmap! I have a few questions I'd like to ask:

What is the motivation of 'key-based routing'?
Have you considered more diversified cache scheduling strategies, or designed scalability for it?
In addition, could you share some details about the 'Cache Scheduling Interface' and its design concepts?

Nov 11 '25 02:11 Keithwwa

Celebrate! Finally, we have the V3 Roadmap! I have a few questions I'd like to ask:

What is the motivation of 'key-based routing'?

Have you considered more diversified cache scheduling strategies, or designed scalability for it?

In addition, could you share some details about the 'Cache Scheduling Interface' and its design concepts?

It means we can provide KVCache location query service for Router.
No updates now; you can give any suggestions, and I will add them to the Roadmap.
It includes migration, promotion/demotion, and hot-data detection (hot-data will have more replicas).

Nov 11 '25 06:11 stmatengss

Congratulations, just a big v3 roadmap!!!! I am very interesting in the Cache Scheduling such as hot data migration, I hope I can join the contribution and do some work to help build a more powerful project~ ╰(°▽°)╯

Nov 14 '25 01:11 zhongzhouTan-coder

Congratulations, just a big v3 roadmap!!!! I am very interesting in the Cache Scheduling such as hot data migration, I hope I can join the contribution and do some work to help build a more powerful project~ ╰(°▽°)╯

Cool! It's a important feature for elastic deployment. Welcom to join the slack channel for offline discussion.

Nov 14 '25 14:11 stmatengss