rbg icon indicating copy to clipboard operation
rbg copied to clipboard

[Feature] Support More Flexible Topology Scheduling Annotations

Open cheyang opened this issue 3 months ago • 0 comments

Checklist

  • [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • [x] 2. Please use English, otherwise it will be closed.

Motivation

Problem Description

The current RBG/RBGS only supports the exclusive-topology annotation for mandatory topology domain binding. To accommodate more complex scheduling scenarios, we require:

  1. Multi-level Topology Scheduling: Priority-based domain selection with fallback
  2. Hard/Soft Constraint Separation: Mandatory vs. preferred constraints
  3. Weighted Preferences: Granular priority control for soft constraints

Proposal

Enhanced annotations to complement exclusive-topology:

Annotation Name Type Description Example Value
group-required-topology Hard constraint Mandatory domains (sequentially attempted) kubernetes.io/hostname,topology.kubernetes.io/zone or kubernetes.io/hostname
group-preferred-topology Weighted soft constraint Preferred domains with priority weights topology.kubernetes.io/rack:80,topology.kubernetes.io/zone:20 or topology.kubernetes.io/rack whose default weight is 50

Weight Configuration Syntax:
<topology-key-1>:<weight-1>,<topology-key-2>:<weight-2>

  • Weights range: 1-100 (higher = stronger preference)
  • Default weight: 50 if unspecified

Usage Example

apiVersion: workloads.x-k8s.io/v1alpha1  
kind: RoleBasedGroup  
metadata:  
  annotations:  
    rolebasedgroup.workloads.x-k8s.io/group-required-topology: "kubernetes.io/hostname,topology.kubernetes.io/zone"  
    rolebasedgroup.workloads.x-k8s.io/group-preferred-topology: "topology.kubernetes.io/rack:80,topology.kubernetes.io/zone:20"  
spec:  
  roles:  
    - name: leader  
      replicas: 1  
    - name: worker  
      replicas: 3  

Interpretation:

  1. Hard constraint: Must schedule in same host → same zone
  2. Soft constraint: Strong preference (80) for same rack, weak preference (20) for same zone

Acceptance Criteria

  1. [ ] Annotation parsing with weight support
  2. [ ] Multi-level topology scheduling with fallback
  3. [ ] Weighted preference handling in scheduler
  4. [ ] Default weight (50) application when unspecified
  5. [ ] Validation for weight range (1-100)
  6. [ ] Updated documentation with weighted examples

Related resources

No response

cheyang avatar Oct 10 '25 03:10 cheyang