kubernetes icon indicating copy to clipboard operation
kubernetes copied to clipboard

scheduler: Improve CSILimits plugin accuracy by using VolumeAttachments

Open torredil opened this issue 1 year ago • 14 comments

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes a bug in the CSILimits scheduler plugin where volumes from deleted pods are not being counted, which results in over-scheduling.

The bug occurs because when a pod is deleted, its volumes are no longer considered in the current scheduling logic, but VolumeAttachments may still exist (which more accurately represent the actual state of attachments).

Which issue(s) this PR fixes:

Fixes https://github.com/kubernetes/kubernetes/issues/126502

Note for Reviewers

A volume is considered "fully detached" once the associated VolumeAttachment object is deleted from the API server: https://github.com/kubernetes/kubernetes/blob/4aeaf1e99e82da8334c0d6dddd848a194cd44b4f/pkg/volume/csi/csi_attacher.go#L444-L447

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

N/A

torredil avatar Sep 30 '24 13:09 torredil

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Sep 30 '24 13:09 k8s-ci-robot

/sig storage

torredil avatar Sep 30 '24 13:09 torredil

/retest

torredil avatar Sep 30 '24 20:09 torredil

cc: @jsafrane for review whenever you have some time 🙂

torredil avatar Oct 01 '24 17:10 torredil

/assign

gnufied avatar Oct 03 '24 16:10 gnufied

Discussed during sig-storage and we'll want to account for plugins which don't create VA objects / don't support ControllerPublishVolume RPC, I'll include that in next rev.

torredil avatar Oct 03 '24 16:10 torredil

We've previously encountered this issue and an auto retry would resolve it(volumeattachment will ultimately be deleted). Can you please help to clarify if there's a specific reason to address this?

mowangdk avatar Oct 05 '24 06:10 mowangdk

@mowangdk That is a good observation - typically the issue sorts itself out over time assuming the relevant VA object is eventually deleted. However, that is not a great assumption to make and is not ideal for several reasons. To be concise, the scheduler should make decisions based on accurate information that reflects the actual state of the world. As described in https://github.com/kubernetes/kubernetes/issues/126502, relying on eventual consistency as a crutch here leads to poor scheduling choices that introduces unnecessary delay.

From the user's perspective, its also quite confusing to randomly observe volume attachment failures that mysteriously resolve themselves after some time.

torredil avatar Oct 15 '24 20:10 torredil

cc: @gnufied @jsafrane - PTAL, this is ready for review.

torredil avatar Oct 15 '24 20:10 torredil

mostly lgtm

gnufied avatar Oct 18 '24 18:10 gnufied

/lgtm

gnufied avatar Oct 18 '24 20:10 gnufied

LGTM label has been added.

Git tree hash: b536ad999ebd9e27f986ab997c898d68f550e704

k8s-ci-robot avatar Oct 18 '24 20:10 k8s-ci-robot

@mowangdk That is a good observation - typically the issue sorts itself out over time assuming the relevant VA object is eventually deleted. However, that is not a great assumption to make and is not ideal for several reasons. To be concise, the scheduler should make decisions based on accurate information that reflects the actual state of the world. As described in #126502, relying on eventual consistency as a crutch here leads to poor scheduling choices that introduces unnecessary delay.

From the user's perspective, its also quite confusing to randomly observe volume attachment failures that mysteriously resolve themselves after some time.

@torredil gotcha, thank you for your input. Does this PR potentially impact the autoscheduler in the cluster? My previous similar PR was blocked for this reason.

mowangdk avatar Oct 20 '24 08:10 mowangdk

@mowangdk To the best of my knowledge, this PR does not affect CA in any way.

torredil avatar Oct 21 '24 14:10 torredil

/approve

gnufied avatar Oct 21 '24 15:10 gnufied

This PR is approved by sig-scheduling and sig-storage, but I believe it should be reviewed and approved by sig-auth as well.

cc: @deads2k @dims

torredil avatar Oct 23 '24 13:10 torredil

/assign @liggitt @deads2k

the changes in bootstrappolicy seem straight forward to me :) but let's get the actual approve from sig-auth folks. if it gets closer to code freeze and does not land, i can take another peek at outstanding comments, please loop me back in then.

dims avatar Oct 23 '24 15:10 dims

/approve

for role addition

liggitt avatar Oct 23 '24 16:10 liggitt

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied, liggitt, sanposhiho, torredil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Oct 23 '24 16:10 k8s-ci-robot