scheduler: Improve CSILimits plugin accuracy by using VolumeAttachments
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixes a bug in the CSILimits scheduler plugin where volumes from deleted pods are not being counted, which results in over-scheduling.
The bug occurs because when a pod is deleted, its volumes are no longer considered in the current scheduling logic, but VolumeAttachments may still exist (which more accurately represent the actual state of attachments).
Which issue(s) this PR fixes:
Fixes https://github.com/kubernetes/kubernetes/issues/126502
Note for Reviewers
A volume is considered "fully detached" once the associated VolumeAttachment object is deleted from the API server: https://github.com/kubernetes/kubernetes/blob/4aeaf1e99e82da8334c0d6dddd848a194cd44b4f/pkg/volume/csi/csi_attacher.go#L444-L447
Does this PR introduce a user-facing change?
NONE
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
N/A
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/sig storage
/retest
cc: @jsafrane for review whenever you have some time 🙂
/assign
Discussed during sig-storage and we'll want to account for plugins which don't create VA objects / don't support ControllerPublishVolume RPC, I'll include that in next rev.
We've previously encountered this issue and an auto retry would resolve it(volumeattachment will ultimately be deleted). Can you please help to clarify if there's a specific reason to address this?
@mowangdk That is a good observation - typically the issue sorts itself out over time assuming the relevant VA object is eventually deleted. However, that is not a great assumption to make and is not ideal for several reasons. To be concise, the scheduler should make decisions based on accurate information that reflects the actual state of the world. As described in https://github.com/kubernetes/kubernetes/issues/126502, relying on eventual consistency as a crutch here leads to poor scheduling choices that introduces unnecessary delay.
From the user's perspective, its also quite confusing to randomly observe volume attachment failures that mysteriously resolve themselves after some time.
cc: @gnufied @jsafrane - PTAL, this is ready for review.
mostly lgtm
/lgtm
LGTM label has been added.
@mowangdk That is a good observation - typically the issue sorts itself out over time assuming the relevant VA object is eventually deleted. However, that is not a great assumption to make and is not ideal for several reasons. To be concise, the scheduler should make decisions based on accurate information that reflects the actual state of the world. As described in #126502, relying on eventual consistency as a crutch here leads to poor scheduling choices that introduces unnecessary delay.
From the user's perspective, its also quite confusing to randomly observe volume attachment failures that mysteriously resolve themselves after some time.
@torredil gotcha, thank you for your input. Does this PR potentially impact the autoscheduler in the cluster? My previous similar PR was blocked for this reason.
@mowangdk To the best of my knowledge, this PR does not affect CA in any way.
/approve
This PR is approved by sig-scheduling and sig-storage, but I believe it should be reviewed and approved by sig-auth as well.
cc: @deads2k @dims
/assign @liggitt @deads2k
the changes in bootstrappolicy seem straight forward to me :) but let's get the actual approve from sig-auth folks. if it gets closer to code freeze and does not land, i can take another peek at outstanding comments, please loop me back in then.
/approve
for role addition
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: gnufied, liggitt, sanposhiho, torredil
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~pkg/scheduler/OWNERS~~ [liggitt,sanposhiho]
- ~~plugin/pkg/auth/authorizer/OWNERS~~ [liggitt]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment