karmada
karmada copied to clipboard
[Feature] Karmada-scheduler support custom-plugin when ReplicaScheduling
What would you like to be added:
Currently, the karmada-scheduler
only supports adding custom plugins in the first two scheduling stages (FilterClusters
, ScoreCluster
). I hope that custom plugins can be supported in the last stage (ReplicaScheduling
) to achieve some functionalities.
You can check this document about the 4 scheduling stages
https://karmada.io/docs/next/developers/customize-karmada-scheduler/
Why is this needed:
Currently, I am working hard to enable Volcano
to support multi-cluster AI workload scheduling. Volcano
currently only has the capability for single-cluster operations. We believe that Karmada
is very suitable for combining with Volcano
to achieve multi-cluster workload scheduling capabilities.
In Volcano
, the most basic and important scheduling strategy is gang
scheduling, which essentially means that the cluster's resources must be sufficient to launch all the pods required by the target job. In Kubernetes' default scheduler, if a job requires 100 pods, each with 100GB of memory, but the cluster only has 50GB available, the job will still start. However, due to the insufficient number of pods started, the job will occupy resources while blocking resources for the entire cluster.
If custom-plugin support is enabled in the ReplicaScheduling
stage, then we can determine during this stage whether the job can be fully scheduled to a single target cluster. If it cannot, we can halt the scheduling process. Currently, our scheduling goal is that once a user creates a vcjob
and PropagationPolicy
, the vcjob
can be scheduled to a single cluster (splitting the job across multiple clusters is more complex and will be considered later). This ensures that the vcjob
is not divided and that the target cluster can fully start the vcjob
.
Karmada-scheduler has four scheduling phases, but currently only the first two phases (FilterCluster, ScoreCluster) support custom plugins. I believe it is a reasonable requirement for all four phases to support custom plugins, which would enhance the extensibility of the scheduler, enabling support for a wider range of scenarios. The current two extension points are insufficient.
Reference Material:
- https://docs.google.com/document/d/1l6zO4xf879KdW_WPS7aMED0SUmnk487_XDsC12TtuTQ/edit
- https://karmada.io/docs/next/developers/customize-karmada-scheduler/
The work of reconstructing the scheduler framework currently seems difficult to implement in a short time, even enabling support plugin for ReplicaScheduling
is a complex and tedious task. I hope we can continue to advance this work in stages.
cc main owners
@RainbowMango @XiShanYongYe-Chang @chaunceyjiang