[RoadMap][Call For Contributions] Mooncake Store V3 Roadmap
Milestone 1: Core Architecture Refactor & Decoupling
This milestone focuses on foundational architectural changes to improve modularity, flexibility, and prepare for future scaling.
- [ ] (TE/Store Separation): Decouple the TE (Task/Tensor Engine) and Store components into separate, independent packages.
- [ ] (Client/Worker Decoupling): Decouple the dummy client from the worker to remove strong dependencies. @YiXR
- https://github.com/kvcache-ai/Mooncake/pull/1084
- https://github.com/kvcache-ai/Mooncake/pull/1122
- #1146
- [ ] (Flexible Deployment): Update the Store to support various flexible deployment models, such as client-only, client + master, etc.
- [ ] (Tensor-native APIs): Put/Get Tensor APIs contain TP rank and model info.
- #1127
- [ ] (Layer-wise Storage): Maintaining layer info on the Mooncake side, supports stream transfer and communication overlap.
Milestone 2: Master Service Enhancements
This milestone enhances the Master component to support new storage architectures and routing logic.
- [ ] (Key-based Routing): Implement new key-based routing capabilities in the Master service.
- [ ] (Metadata Adaptation - Storage): Adapt the Master's metadata management to support the new multi-level storage architecture.
- [ ] (Recovery) kv metadata persistency
- [ ] (KVCache Awareness Interface) Exposes hit ratio for different layers.
- [ ] (Metadata Adaptation - HA): Upgrade metadata schema and logic to meet new High Availability (HA) requirements.
- [ ] (Multi-tenant): Support Multi-tenant with different models, users and auth keys
Milestone 3: Worker: Multi-Level Storage Architecture
This is a major epic to build the next-generation multi-level storage system within the Worker.
-
3.1: Abstraction & Caching
- [ ] (Storage Abstraction Layer): Design and implement the core abstraction layer for multi-level storage.
- [ ] (Cache Scheduling Interface): Design the abstract interface for cache scheduling logic.
- [x] (Eviction Logic): Implement basic data eviction logic within the new storage architecture.
- @Vincent-Bo-ali #1028
- [ ] (LRU Cache): Implement an LRU (Least Recently Used) policy as the default cache scheduling strategy.
- [ ] (Local Client Cache): Keep a local cache for better performance.
- #1062
-
3.2: Storage Backend Implementation
- [ ] (DRAM Adaptation): Adapt the storage layer for DRAM, including support for NUMA affinity.
- [ ] (SSD Adaptation): Adapt the storage layer for SSDs, enabling local external storage read/write capabilities.
- https://github.com/kvcache-ai/Mooncake/issues/1054
- [ ] (VRAM Adaptation): Adapt the storage layer to utilize VRAM.
- [ ] (Huawei NPU Adaptation): Implement support for Huawei NPUs (H2D).
-
3.3: Elastic KVCache Storage
- [ ] (KVCache Migration): Move KVCache between Mooncake client.
- [ ] (Data Replica): Dynamic Replication
- #1100
Milestone 4: Worker: Networking & Elasticity
This milestone focuses on refactoring worker communication and enabling resource elasticity.
- [ ] (RPC Refactor): [Phase 1] Refactor the worker's read/write logic to replace RDMA with RPC-based communication.
- [x] (Barex Transport Support): Support Alibaba barex transport in TE for Mooncake Store.
- #1045
- [ ] (Resource Elasticity): Implement single-worker resource elasticity.
- [ ] (Event‑driven completion): Provide an option to using event-driven notification worker instead of busy-polling.
- #1033
- #1053
- [ ] (IPv6 Support): Support IPv6 in client, master and metadata server.
- #1043
- #1067
Milestone 5: Deployment & Operations
This milestone covers K8s integration (i.e., RBG, https://github.com/sgl-project/rbg) and build process improvements.
- [ ] (K8s Autoscaling): Implement support for Kubernetes-based autoscaling of worker and dummy client instances.
- [ ] (Scenario-based Builds): Implement a build system capable of producing different worker binaries optimized for different scenarios.
- [ ] (Integration With AI Configurator): Use AI Configurator for better measuring Resource workers and other configurations.
- [ ] (Deployment Documentation & Guides): Create comprehensive, up-to-date deployment documentation and step-by-step setup guides to simplify installation and configuration for all environments.
Milestone 6: Conductor
This feature implements a standalone or co-located conductor for use by the Gateway. @zhongzhouTan-coder @yejj710 @Liziqi-77 @Keithwwa @Asher-XunZhang
- [ ] (KV Event): support Mooncake kv event publish & management
- [ ] (KV Metrics): support max cache hit compute
- [ ] (Cache aware route): cache aware route
- [ ] (Controller): conductor reverse proxy and P/D disaggregated control
Milestone 7: Pytorch Eco-system
- [ ] (Support Tensor Attributions)
- [ ] (Native torch format offload)
Milestone 8: CI & CD enhancement
- [ ] (End-to-end CI tests): For SGLang, support Hicache, PD, Elatics EP, checkpoint engine tests.
Milestone 9: Performance & Benchmarks
- [ ] (Store Master Benchmark): Design and integrate a dedicated benchmark for the Mooncake store master module to evaluate throughput, latency, and scalability.
Thanks for being a part of the Mooncake community! Welcome to discuss and contribute!
If you have any ideas, just leave a comment below and help shape the Roadmap.
Celebrate! Finally, we have the V3 Roadmap! I have a few questions I'd like to ask:
- What is the motivation of 'key-based routing'?
- Have you considered more diversified cache scheduling strategies, or designed scalability for it?
- In addition, could you share some details about the 'Cache Scheduling Interface' and its design concepts?
Celebrate! Finally, we have the V3 Roadmap! I have a few questions I'd like to ask:
- What is the motivation of 'key-based routing'?
- Have you considered more diversified cache scheduling strategies, or designed scalability for it?
- In addition, could you share some details about the 'Cache Scheduling Interface' and its design concepts?
- It means we can provide KVCache location query service for Router.
- No updates now; you can give any suggestions, and I will add them to the Roadmap.
- It includes migration, promotion/demotion, and hot-data detection (hot-data will have more replicas).
Congratulations, just a big v3 roadmap!!!! I am very interesting in the Cache Scheduling such as hot data migration, I hope I can join the contribution and do some work to help build a more powerful project~ ╰(°▽°)╯
Congratulations, just a big v3 roadmap!!!! I am very interesting in the Cache Scheduling such as hot data migration, I hope I can join the contribution and do some work to help build a more powerful project~ ╰(°▽°)╯
Cool! It's a important feature for elastic deployment. Welcom to join the slack channel for offline discussion.