rbg
rbg copied to clipboard
Development Roadmap (v0.6.0)
Here is the development roadmap for v0.6.0. Contributions and feedback are welcome.
P1: Critical Path Items
-
In-place Update Enhancement
- [ ] Warm-up stage: Introduce pre-pulled images and pre-warmed models to reduce service downtime (Owner: @Syspretor)
- [ ] In-place recreate: Fallback mechanism when standard in-place updates fail
- [ ] Resource reservation: Ensure deterministic scheduling during non-in-place updates
- [ ] Redundant capacity: Warm up spare capacity to accelerate MaxSurge readiness and GPU utilization
-
Coordinated Update Improvements
- [ ] Clarify trigger conditions
- [ ] State machine tracking
- [ ] Dependency configuration interactions
-
Workload Management
- [ ] InstanceSet stateful mode: Enable as default workload with Stateful/LWS compatibility
- [ ] Template optimization: Reduce duplication via
templateRef
-
Documentation
- [ ] Mooncake Deployment Guide
- [ ] End-to-End Upgrade Practices
- [ ] InstanceSet Deployment Procedures (single-node/multi-node)
- [ ] Coordination Update Specifications
Coordination
- [ ] Coordinated Scaling: Scale specific roles by defined ratios during scaling events
Schedule
- [ ] Flexible Topology Scheduling:
Multi-level scheduling with hard/soft constraints and weighted preferences - [ ] Multi-level Gang Scheduling:
Co-scheduling for dependent pod groups - [ ] Coordinated Scheduling:
Enforce affinity/anti-affinity policies between coordinated roles
RoleBasedGroupSet
- [ ] RBGS-level RollingUpdate implementation
- [ ] State machine refinement and status reporting
CLI (rbgctl) Enhancement
- [ ] SLA-driven configuration: Integrate Dynamo AIConfigurator for initial RBG recommendations
- [ ] Lifecycle management improvements
/assign Coordinated Scaling: Scale specific roles by defined ratios during scaling events