There is no plugin for podAffinity violations
Is your feature request related to a problem? Please describe.
I think it is a problem because, as far as I know, there is no way of rescheduling pods that are in violation of podAffinity.
Describe the solution you'd like
Make a plugin available that deals with this, right? 🥲
Describe alternatives you've considered Right now I'm manually doing rollout in deployments so that the scheduling is right, for example after ugprades.
What version of descheduler are you using?
descheduler version: latest Additional context
Issue #1286 was closed as "Not planned" but feedback was provided and I think it's a feature that's pretty much needed.
Any news?
Hi @tuxillo, sorry but unfortunately we don't have much bandwidth for adding new plugins at this time. The descheduling framework refactor is meant to make it easier for anyone to develop their own plugins. The existing plugin code should serve as a guide, but we hope to publish a more generic developer guide in the future.
@damemi thanks for the update. No worries! I was wondering if this plugin was missing for any particular reason. I might give it a try at implementing it but, should I wait for the refactor you're mentioning?
@tuxillo please feel free to implement it and open a PR if you'd like! the refactor is done already. Ideally, the goal of this was to make it so you don't even need to contribute the plugin code here if you don't want to and could just build your own descheduler with your plugin imported.
I did an initial version using RemovePodsViolatingInterPodAntiAffinity as the base, but as it turns out I ended up just reversing the logic of it, which is most certainly the wrong approach. And also sounds like not worth of a whole plugin.
Let me explain my use case:
I set application A to run in certain nodes. I want application B to always run in nodes where application A is, so for application B I use:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- appA
topologyKey: kubernetes.io/hostname
As you know, sometimes app A will drift to other nodes (karpenter is auto scaling) but app B won't follow. The affinity is broken.
What I am currently doing in predicates is:
func CheckPodAffinityViolation(candidatePod *v1.Pod, assignedPods map[string][]*v1.Pod, nodeMap map[string]*v1.Node) bool {
nodeHavingCandidatePod, ok := nodeMap[candidatePod.Spec.NodeName]
if !ok {
klog.Warningf("CandidatePod %s does not exist in nodeMap", klog.KObj(candidatePod))
return false
}
affinity := candidatePod.Spec.Affinity
if affinity == nil || affinity.PodAffinity == nil {
return false
}
for _, term := range GetPodAffinityTerms(affinity.PodAffinity) {
namespaces := GetNamespacesFromPodAffinityTerm(candidatePod, &term)
selector, err := metav1.LabelSelectorAsSelector(term.LabelSelector)
if err != nil {
klog.ErrorS(err, "Unable to convert LabelSelector into Selector")
return false
}
for namespace := range namespaces {
for _, assignedPod := range assignedPods[namespace] {
if assignedPod.Name == candidatePod.Name || !PodMatchesTermsNamespaceAndSelector(assignedPod, namespaces, selector) {
klog.V(4).InfoS("CandidatePod does not match inter-pod affinity rule of assigned pod on node", "candidatePod", klog.KObj(candidatePod), "assignedPod", klog.KObj(assignedPod))
continue
}
nodeHavingAssignedPod, ok := nodeMap[assignedPod.Spec.NodeName]
if !ok {
continue
}
if hasSameLabelValue(nodeHavingCandidatePod, nodeHavingAssignedPod, term.TopologyKey) {
klog.V(1).InfoS("CandidatePod matches inter-pod affinity rule of assigned pod on node", "candidatePod", klog.KObj(candidatePod), "assignedPod", klog.KObj(assignedPod))
return false
}
}
}
}
klog.V(1).Infof("CandidatePod did not match inter-pod affinity in node %v", candidatePod.Spec.NodeName)
return true
}
So when the candidate pod are in the same node (since that's the topology key), just don't evict it. I think I'm missing some parts of the problem or not following the correct approach.
Any pointers would be appreciated 😄
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Any pointers would be appreciated 😄
Anyone? :)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.