rbg icon indicating copy to clipboard operation
rbg copied to clipboard

Development Roadmap (v0.6.0)

Open Syspretor opened this issue 1 month ago • 1 comments

Here is the development roadmap for v0.6.0. Contributions and feedback are welcome.

P1: Critical Path Items

  • In-place Update Enhancement

    • [ ] Warm-up stage: Introduce pre-pulled images and pre-warmed models to reduce service downtime (Owner: @Syspretor)
    • [ ] In-place recreate: Fallback mechanism when standard in-place updates fail
    • [ ] Resource reservation: Ensure deterministic scheduling during non-in-place updates
    • [ ] Redundant capacity: Warm up spare capacity to accelerate MaxSurge readiness and GPU utilization
  • Coordinated Update Improvements

    • [ ] Clarify trigger conditions
    • [ ] State machine tracking
    • [ ] Dependency configuration interactions
  • Workload Management

    • [ ] InstanceSet stateful mode: Enable as default workload with Stateful/LWS compatibility
    • [ ] Template optimization: Reduce duplication via templateRef
  • Documentation

    • [ ] Mooncake Deployment Guide
    • [ ] End-to-End Upgrade Practices
    • [ ] InstanceSet Deployment Procedures (single-node/multi-node)
    • [ ] Coordination Update Specifications

Coordination

  • [ ] Coordinated Scaling: Scale specific roles by defined ratios during scaling events

Schedule

  • [ ] Flexible Topology Scheduling:
    Multi-level scheduling with hard/soft constraints and weighted preferences
  • [ ] Multi-level Gang Scheduling:
    Co-scheduling for dependent pod groups
  • [ ] Coordinated Scheduling:
    Enforce affinity/anti-affinity policies between coordinated roles

RoleBasedGroupSet

  • [ ] RBGS-level RollingUpdate implementation
  • [ ] State machine refinement and status reporting

CLI (rbgctl) Enhancement

  • [ ] SLA-driven configuration: Integrate Dynamo AIConfigurator for initial RBG recommendations
  • [ ] Lifecycle management improvements

Syspretor avatar Nov 25 '25 13:11 Syspretor

/assign Coordinated Scaling: Scale specific roles by defined ratios during scaling events

bcfre avatar Nov 26 '25 07:11 bcfre