descheduler Descheduling framework: define framework types

trafficstars

Based on comments from https://github.com/kubernetes-sigs/descheduler/issues/753:

descheduler configuration with profiles (internal/v1alpha2) with conversion/validation/defaulting
handle for injecting clientset, evictor, shared informer factory, etc.
plugin extension point interfaces
example strategy arguments + defaulting/validation

Use cases to keep in mind:

The descheduling framework can be configured: A user needs to be allowed to specify which extension points are enabled/disabled, alongside specifying plugin configuration. Including ability to configure multiple descheduling profiles.
The descheduling framework configuration can be converted into an internal representation (e.g. v1alpha2 -> internal): To provide ability to introduce/remove features by increasing configuration versions, yet still having path to convert all configuration versions into a single internal so the framework internals are kept as stable as possible
The descheduling framework configuration gets defaulted: To reduce need to specify value for every possible configuration, also defaulting serves as a recommended/opinionated settings for the plugins
The descheduling framework needs to provide basic handle for plugins: The framework needs to set responsibility for/ownership of lifecycle of processes/workflows, i.e. the framework is responsible for initializing clientset, node lister, pod lister, evictor, etc. The framework handle types need to provide basic handles while still keeping the abstraction level high.
The descheduling plugin interface is correctly specified: A plugin is an encapsulation of a user eviction decision making logic. It defines a boundary/communication protocol. Thus, it is important to clearly but simply define how the plugin is communicating with the framework. E.g. specifying the extension points.
The descheduling plugins are available to the framework through plugin registry: The framework needs to provide an easy way of registering in-tree/out-of-tree plugins.
The descheduling framework code hierarchy is flexible: The framework code needs to be properly divided so it is clear, easily extendable and mocked/testable.
Easy to read examples on how to build a plugin: Provide few examples of plugin implementations

In overall, the task in question here is to define the data model of the framework. The functionality itself is part of the next issue(s).

Pre-requisities:

[x] split the PodEvictor into the evictor and filter: https://github.com/kubernetes-sigs/descheduler/pull/847
[x] simplify the pod evictor so it can be simply injected and invoked inside strategies: https://github.com/kubernetes-sigs/descheduler/pull/846

Strategies for migration:

[x] RemoveDuplicates
[x] LowNodeUtilization
[x] HighNodeUtilization
[x] RemovePodsViolatingInterPodAntiAffinity
[x] RemovePodsViolatingNodeAffinity
[x] RemovePodsViolatingNodeTaints: @ingvagabund https://github.com/kubernetes-sigs/descheduler/pull/857
[x] RemovePodsHavingTooManyRestarts
[x] PodLifeTime
[x] RemovePodsViolatingTopologySpreadConstraint
[x] RemoveFailedPods

Jun 06 '22 13:06 ingvagabund

/help

Jun 06 '22 13:06 ingvagabund

@ingvagabund: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jun 06 '22 13:06 k8s-ci-robot

Why are we solving this issue?

To have the descheduling framework implemented/designed through the community effort

To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?

A completely new code is to be implemented. Including the directory structure of the framework. To be discussed as part of the resolution.

Does this issue have zero to low barrier of entry?

Not really. The discussion is the primary source of the changes.

How can the assignee reach out to you for help?

CC'ing @ingvagabund

Jun 06 '22 13:06 ingvagabund

/cc

Jun 06 '22 14:06 binacs

/cc @ingvagabund Would love to help

Jun 06 '22 14:06 harshanarayana

/cc

Jun 06 '22 14:06 eminaktas

could it possible to separte several tasks or steps then we can cooperate together ~

Jun 07 '22 01:06 JaneLiuL

@ingvagabund can we refer https://github.com/kubernetes-sigs/descheduler/pull/781 for approaching this? I'm guessing it must be having some similar works. /cc @ingvagabund

Jun 07 '22 01:06 pravarag

@ingvagabund can we refer https://github.com/kubernetes-sigs/descheduler/pull/781 for approaching this? I'm guessing it must be having some similar works.

I am using #781 as a reference for suggesting what we need. Nevertheless, it is still a proof of concept. Please treat it as such.

Jun 07 '22 08:06 ingvagabund

I think the 2 big things are configuration and plugins. So, to start I would propose two main subtasks:

Finalize plugin API
Convert existing strategies to plugins

Both of these should be seamless (in terms of not breaking functionality mid-move or delaying our regular release cadence if we hit trouble), so we can migrate plugins one-by-one if we choose. We can use wrapper functions around the existing strategy functions to shim the migration until it's ready to shift over if necessary, which is how the scheduling framework was implemented.

The first PR here then would be v1alpha2 of the DeschedulerPolicy API. @ingvagabund would you like to open that up for discussion based on the outline in your design doc?

That PR can come with a matching migration for the first strategy we want to tackle. This can be any strategy, but preferably a separate PR.

At that point, we can assign out the work to transition the remaining strategies to plugins. As we make progress on that, we can follow back up with the remaining tasks here and wire everything together. Does that sound reasonable?

Jun 07 '22 13:06 damemi

Hey, I'm new here (and to some aspects of k8s development). I might have a few stupid questions, I wanna get them out of the way here:

The descheduling framework configuration can be converted into an internal representation (e.g. v1alpha2 -> internal)

Here we are assuming changes from v1alpha1 to v1alpha2 that will keep the same behavior, right? (and it makes sense, since it is stable)

But the idea is also to make this project stop reading from files and be a controller reading CRDs from your cluster, am I correct? (with informers and all that gist)

With that in mind, shouldn't the conversion happen with a conversion webhook so any call [get,update,create,etc] to those resources also see the conversion? (not only the descheduler)

The descheduling framework needs to provide basic handle for plugins

~~Are these runtime plugins that users can point to with workload running (shared lib style)? Or plugins that users can register in code and build their own images?~~ EDIT: talked with @ingvagabund, and also checked the wip PR, and these are plugins following similar approach for plugin registry as the ones from the scheduler framework (which I was not familiar with).

I see the proposed v1alpha2 of DeschedulerPolicy here:

apiVersion: descheduler/v1alpha2
kind: DeschedulerPolicy
profiles:
- name: ProfileName
 pluginConfig:
 - name: DefaultEvictor
   args:
     evictSystemCriticalPods: true # default false
     evictFailedBarePods: true # default false
     evictLocalStoragePods: true # default false
     nodeFit: true # default false
 - name: PodsHavingTooManyRestarts
   args:
     podRestartThreshold: 100
     includingInitContainers: true
 - name: LowNodeUtilization
   args:
     lowThreshold:
       cpu : 20
       memory: 20
       pods: 20
     highThreshold:
       cpu : 50
       memory: 50
       pods: 50
 - name: TopologySpreadConstraint
   args:
     includeSoftConstraints: true
 plugins:
   filter:
     # default filters: DefaultEvictor
     disabled:
     enabled:
   sort:
     # default sorters: Priority
     disabled:
       Priority
     enabled:
       Utilization
   deschedule:
     # plugins descheduling pods without checking other pods
     enabled:
     - PodsHavingTooManyRestarts
     - InterPodAntiAffinity
   rebalance:
     # plugins descheduling pods while balancing distribution
     enabled:
     - Duplicates
     - LowNodeUtilization
     - TopologySpreadConstraint
   evict:
     # default evict plugins: DefaultEvictor
     disabled:
       DefaultEvictor
     enabled:
       CustomEvictor

It has group version kind, but no spec and status. I am now not sure if the idea is for descheduler to be a controller also watching DeschedulerPolicy resrouces. Is it still the idea for this to be a config file? (this also may render my 1st question invalid :sweat_smile: )

Jun 07 '22 16:06 knelasevero

But the idea is also to make this project stop reading from files and be a controller reading CRDs from your cluster, am I correct? (with informers and all that gist)

There's no plan to make the config a CRD, you might have seen some mention of informers as a way to trigger descheduling based on cluster events (for example, a node being added). But right now there's no need for a CRD

Jun 07 '22 17:06 damemi

@damemi Great, now it's clear, thanks!

Jun 07 '22 17:06 knelasevero

Here we are assuming changes from v1alpha1 to v1alpha2 that will keep the same behavior, right? (and it makes sense, since it is stable)

In v1alpha2 we will be introducing new functionalities in the strategies. E.g. sorting pods. The expressive power of the configuration will go up a bit. Though, the original decision logic of all the strategies will stay unchanged in the core. Although, we can discuss specific changes per PR bases as we migrate the strategies into plugins.

But the idea is also to make this project stop reading from files and be a controller reading CRDs from your cluster, am I correct? (with informers and all that gist)

In the past there was the idea of sharing configuration of all controllers (including kubelets) in the cluster as objects so the configuration can be updated/read on the fly by higher level tooling. Not sure what's the current state of this effort. Though, in case of the scheduler and the descheduler the configuration is stored in a cluster through a config map which then contains yaml representation of it as a well-defined type with kind and apiVersion. Yet both scheduler and descheduler still read the configuration as a file passed in as a file location through flags.

Jun 07 '22 20:06 ingvagabund

Finalize plugin API Convert existing strategies to plugins

+1 for this as the starting point. I will go ahead and prepare a PR where we can take a closer look at both the API and one of the strategies migrated into a plugin as an example for the remaining strategies.

Jun 07 '22 20:06 ingvagabund

https://github.com/kubernetes-sigs/descheduler/issues/845

These proposals for changes in DeschedulerPolicy regarding fields that don't make sense in certain strategies will have to be considered when creating v1alpha2 I assume

Jun 09 '22 15:06 knelasevero

@eminaktas @harshanarayana @JaneLiuL @pravarag @damemi any review help in https://github.com/kubernetes-sigs/descheduler/pull/847 appreciated

Jun 13 '22 15:06 ingvagabund

At that point, we can assign out the work to transition the remaining strategies to plugins.

I'm late to the part but count me in! 🕺🏼

Jun 20 '22 23:06 a7i

I'm in as well 🙌

Jun 21 '22 11:06 knelasevero

I assume the v1alpha2 from the proposal is already good enough since we did not get any more comments on it?

Jun 21 '22 11:06 knelasevero

I assume the v1alpha2 from the proposal is already good enough since we did not get any more comments on it?

Yup, we're going ahead with the proposed design as our outline for the work. If we need to change anything (like something we didn't think about comes up) we can be flexible with it along the way

Jun 22 '22 11:06 damemi

@damemi @ingvagabund if I want to work on any of the above mentioned strategies here: https://github.com/kubernetes-sigs/descheduler/issues/837#issue-1261828383, shall I just claim directly or these are to be approached one after the other only?

Jun 22 '22 11:06 pravarag

@pravarag feel free to claim some and start opening PRs to move them. I think @ingvagabund is on vacation right now, so we won't be able to update the list with who was which ones, but we can keep track.

The first few will probably take some review but once we have the pattern established the rest should go easy.

Jun 22 '22 11:06 damemi

Hey, I'll be working on migrating the nodeAfinity strategy. Should have a PR soon 🙂

EDIT: PR here: https://github.com/kubernetes-sigs/descheduler/pull/860

Jun 22 '22 11:06 knelasevero

I can take FailedPods and PodLifeTime. Should have PRs soon

RemoveFailedPods PR #861

Jun 23 '22 14:06 a7i

I'll be happy to take LowNodeUtilizations and HighNodeUtilizations if this is not already worked upon.

Jun 23 '22 14:06 pravarag

I can take RemovePodsViolatingTopologySpreadConstraint. Should be able to get a pr open next week.

Jun 24 '22 00:06 jklaw90

i can take RemoveDuplicates and RemovePodsHavingTooManyRestarts. will get pr soon

Jun 24 '22 01:06 JaneLiuL

I am grabbing and working on RemovePodsViolatingInterPodAntiAffinity next

Aug 04 '22 18:08 knelasevero

it seems still lack of PodLifeTime, please ping me if need my help on this

Aug 11 '22 07:08 JaneLiuL

descheduler descheduler copied to clipboard

Descheduling framework: define framework types

Guidelines

descheduler
descheduler copied to clipboard