descheduler
descheduler copied to clipboard
Descheduling framework: define framework types
Based on comments from https://github.com/kubernetes-sigs/descheduler/issues/753:
- descheduler configuration with profiles (internal/v1alpha2) with conversion/validation/defaulting
- handle for injecting clientset, evictor, shared informer factory, etc.
- plugin extension point interfaces
- example strategy arguments + defaulting/validation
Use cases to keep in mind:
- The descheduling framework can be configured: A user needs to be allowed to specify which extension points are enabled/disabled, alongside specifying plugin configuration. Including ability to configure multiple descheduling profiles.
- The descheduling framework configuration can be converted into an internal representation (e.g. v1alpha2 -> internal): To provide ability to introduce/remove features by increasing configuration versions, yet still having path to convert all configuration versions into a single internal so the framework internals are kept as stable as possible
- The descheduling framework configuration gets defaulted: To reduce need to specify value for every possible configuration, also defaulting serves as a recommended/opinionated settings for the plugins
- The descheduling framework needs to provide basic handle for plugins: The framework needs to set responsibility for/ownership of lifecycle of processes/workflows, i.e. the framework is responsible for initializing clientset, node lister, pod lister, evictor, etc. The framework handle types need to provide basic handles while still keeping the abstraction level high.
- The descheduling plugin interface is correctly specified: A plugin is an encapsulation of a user eviction decision making logic. It defines a boundary/communication protocol. Thus, it is important to clearly but simply define how the plugin is communicating with the framework. E.g. specifying the extension points.
- The descheduling plugins are available to the framework through plugin registry: The framework needs to provide an easy way of registering in-tree/out-of-tree plugins.
- The descheduling framework code hierarchy is flexible: The framework code needs to be properly divided so it is clear, easily extendable and mocked/testable.
- Easy to read examples on how to build a plugin: Provide few examples of plugin implementations
In overall, the task in question here is to define the data model of the framework. The functionality itself is part of the next issue(s).
Pre-requisities:
- [x] split the
PodEvictorinto the evictor and filter: https://github.com/kubernetes-sigs/descheduler/pull/847 - [x] simplify the pod evictor so it can be simply injected and invoked inside strategies: https://github.com/kubernetes-sigs/descheduler/pull/846
Strategies for migration:
- [x] RemoveDuplicates
- [x] LowNodeUtilization
- [x] HighNodeUtilization
- [x] RemovePodsViolatingInterPodAntiAffinity
- [x] RemovePodsViolatingNodeAffinity
- [x] RemovePodsViolatingNodeTaints: @ingvagabund https://github.com/kubernetes-sigs/descheduler/pull/857
- [x] RemovePodsHavingTooManyRestarts
- [x] PodLifeTime
- [x] RemovePodsViolatingTopologySpreadConstraint
- [x] RemoveFailedPods
/help
@ingvagabund: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Why are we solving this issue?
To have the descheduling framework implemented/designed through the community effort
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
A completely new code is to be implemented. Including the directory structure of the framework. To be discussed as part of the resolution.
Does this issue have zero to low barrier of entry?
Not really. The discussion is the primary source of the changes.
How can the assignee reach out to you for help?
CC'ing @ingvagabund
/cc
/cc @ingvagabund Would love to help
/cc
could it possible to separte several tasks or steps then we can cooperate together ~
@ingvagabund can we refer https://github.com/kubernetes-sigs/descheduler/pull/781 for approaching this? I'm guessing it must be having some similar works. /cc @ingvagabund
@ingvagabund can we refer https://github.com/kubernetes-sigs/descheduler/pull/781 for approaching this? I'm guessing it must be having some similar works.
I am using #781 as a reference for suggesting what we need. Nevertheless, it is still a proof of concept. Please treat it as such.
I think the 2 big things are configuration and plugins. So, to start I would propose two main subtasks:
- Finalize plugin API
- Convert existing strategies to plugins
Both of these should be seamless (in terms of not breaking functionality mid-move or delaying our regular release cadence if we hit trouble), so we can migrate plugins one-by-one if we choose. We can use wrapper functions around the existing strategy functions to shim the migration until it's ready to shift over if necessary, which is how the scheduling framework was implemented.
The first PR here then would be v1alpha2 of the DeschedulerPolicy API. @ingvagabund would you like to open that up for discussion based on the outline in your design doc?
That PR can come with a matching migration for the first strategy we want to tackle. This can be any strategy, but preferably a separate PR.
At that point, we can assign out the work to transition the remaining strategies to plugins. As we make progress on that, we can follow back up with the remaining tasks here and wire everything together. Does that sound reasonable?
Hey, I'm new here (and to some aspects of k8s development). I might have a few stupid questions, I wanna get them out of the way here:
The descheduling framework configuration can be converted into an internal representation (e.g. v1alpha2 -> internal)
Here we are assuming changes from v1alpha1 to v1alpha2 that will keep the same behavior, right? (and it makes sense, since it is stable)
But the idea is also to make this project stop reading from files and be a controller reading CRDs from your cluster, am I correct? (with informers and all that gist)
With that in mind, shouldn't the conversion happen with a conversion webhook so any call [get,update,create,etc] to those resources also see the conversion? (not only the descheduler)
The descheduling framework needs to provide basic handle for plugins
~~Are these runtime plugins that users can point to with workload running (shared lib style)? Or plugins that users can register in code and build their own images?~~ EDIT: talked with @ingvagabund, and also checked the wip PR, and these are plugins following similar approach for plugin registry as the ones from the scheduler framework (which I was not familiar with).
I see the proposed v1alpha2 of DeschedulerPolicy here:
apiVersion: descheduler/v1alpha2
kind: DeschedulerPolicy
profiles:
- name: ProfileName
pluginConfig:
- name: DefaultEvictor
args:
evictSystemCriticalPods: true # default false
evictFailedBarePods: true # default false
evictLocalStoragePods: true # default false
nodeFit: true # default false
- name: PodsHavingTooManyRestarts
args:
podRestartThreshold: 100
includingInitContainers: true
- name: LowNodeUtilization
args:
lowThreshold:
cpu : 20
memory: 20
pods: 20
highThreshold:
cpu : 50
memory: 50
pods: 50
- name: TopologySpreadConstraint
args:
includeSoftConstraints: true
plugins:
filter:
# default filters: DefaultEvictor
disabled:
enabled:
sort:
# default sorters: Priority
disabled:
Priority
enabled:
Utilization
deschedule:
# plugins descheduling pods without checking other pods
enabled:
- PodsHavingTooManyRestarts
- InterPodAntiAffinity
rebalance:
# plugins descheduling pods while balancing distribution
enabled:
- Duplicates
- LowNodeUtilization
- TopologySpreadConstraint
evict:
# default evict plugins: DefaultEvictor
disabled:
DefaultEvictor
enabled:
CustomEvictor
It has group version kind, but no spec and status. I am now not sure if the idea is for descheduler to be a controller also watching DeschedulerPolicy resrouces. Is it still the idea for this to be a config file? (this also may render my 1st question invalid :sweat_smile: )
But the idea is also to make this project stop reading from files and be a controller reading CRDs from your cluster, am I correct? (with informers and all that gist)
There's no plan to make the config a CRD, you might have seen some mention of informers as a way to trigger descheduling based on cluster events (for example, a node being added). But right now there's no need for a CRD
@damemi Great, now it's clear, thanks!
Here we are assuming changes from v1alpha1 to v1alpha2 that will keep the same behavior, right? (and it makes sense, since it is stable)
In v1alpha2 we will be introducing new functionalities in the strategies. E.g. sorting pods. The expressive power of the configuration will go up a bit. Though, the original decision logic of all the strategies will stay unchanged in the core. Although, we can discuss specific changes per PR bases as we migrate the strategies into plugins.
But the idea is also to make this project stop reading from files and be a controller reading CRDs from your cluster, am I correct? (with informers and all that gist)
In the past there was the idea of sharing configuration of all controllers (including kubelets) in the cluster as objects so the configuration can be updated/read on the fly by higher level tooling. Not sure what's the current state of this effort. Though, in case of the scheduler and the descheduler the configuration is stored in a cluster through a config map which then contains yaml representation of it as a well-defined type with kind and apiVersion. Yet both scheduler and descheduler still read the configuration as a file passed in as a file location through flags.
Finalize plugin API Convert existing strategies to plugins
+1 for this as the starting point. I will go ahead and prepare a PR where we can take a closer look at both the API and one of the strategies migrated into a plugin as an example for the remaining strategies.
https://github.com/kubernetes-sigs/descheduler/issues/845
These proposals for changes in DeschedulerPolicy regarding fields that don't make sense in certain strategies will have to be considered when creating v1alpha2 I assume
@eminaktas @harshanarayana @JaneLiuL @pravarag @damemi any review help in https://github.com/kubernetes-sigs/descheduler/pull/847 appreciated
At that point, we can assign out the work to transition the remaining strategies to plugins.
I'm late to the part but count me in! πΊπΌ
I'm in as well π
I assume the v1alpha2 from the proposal is already good enough since we did not get any more comments on it?
I assume the v1alpha2 from the proposal is already good enough since we did not get any more comments on it?
Yup, we're going ahead with the proposed design as our outline for the work. If we need to change anything (like something we didn't think about comes up) we can be flexible with it along the way
@damemi @ingvagabund if I want to work on any of the above mentioned strategies here: https://github.com/kubernetes-sigs/descheduler/issues/837#issue-1261828383, shall I just claim directly or these are to be approached one after the other only?
@pravarag feel free to claim some and start opening PRs to move them. I think @ingvagabund is on vacation right now, so we won't be able to update the list with who was which ones, but we can keep track.
The first few will probably take some review but once we have the pattern established the rest should go easy.
Hey, I'll be working on migrating the nodeAfinity strategy. Should have a PR soon π
EDIT: PR here: https://github.com/kubernetes-sigs/descheduler/pull/860
I can take FailedPods and PodLifeTime. Should have PRs soon
RemoveFailedPods PR #861
I'll be happy to take LowNodeUtilizations and HighNodeUtilizations if this is not already worked upon.
I can take RemovePodsViolatingTopologySpreadConstraint. Should be able to get a pr open next week.
i can take RemoveDuplicates and RemovePodsHavingTooManyRestarts. will get pr soon
I am grabbing and working on RemovePodsViolatingInterPodAntiAffinity next
it seems still lack of PodLifeTime, please ping me if need my help on this