kubernetes icon indicating copy to clipboard operation
kubernetes copied to clipboard

pod admission checks whether an extended resource exists in node allocatable, instead of checking whether the allocatable is zero

Open yliaog opened this issue 2 weeks ago • 6 comments

What happened?

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/lifecycle/predicate.go#L357

it checks whether the extended resource exists in node allocatable

What did you expect to happen?

it should check whether the extended resource in node allocatable is zero or not exists

How can we reproduce it (as minimally and precisely as possible)?

unit test can reproduce it

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

yliaog avatar Dec 08 '25 19:12 yliaog

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Dec 08 '25 19:12 k8s-ci-robot

/wg device-management

yliaog avatar Dec 08 '25 19:12 yliaog

@bart0sh could you take a look at this?

yliaog avatar Dec 08 '25 19:12 yliaog

/assign

bart0sh avatar Dec 09 '25 08:12 bart0sh

it should check whether the extended resource in node allocatable is zero or not exists

In this case all extended resources handled by device plugins will be filtered out, which most probably will break something. I'd say that it should check if node allocatable exists (which it checks currently) and is not zero.

However, I still don't understand why all e2e tests pass. Theoretically all of the test pods should fail node admission if device plugin was never running on the node. I'll dig deeper.

unit test can reproduce it

Which ones?

All unit and e2e tests that test extended resources pass on current master, which shouldn't happen if admission logic is broken.

bart0sh avatar Dec 11 '25 13:12 bart0sh

@yliaog Figured it out. Even if resource is not filtered out by the removeMissingExtendedResources, it will be correctly handled by the generalFilter call, which ends up with calling fitRequest down the call stack and then shouldDelegateResourceToDRA which checks node allocatable value.

Long story short - we can leave the code as is as it works correctly or we can modify it, but only for the case when DRAExtendedResources feature gate is enabled. I tend to leave it as it is, but it's debatable.

bart0sh avatar Dec 11 '25 18:12 bart0sh

i think it works currently because we allow draManager == nil (for this kubelet admission check) (see https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/plugins/noderesources/fit.go#L258)

yliaog avatar Dec 12 '25 22:12 yliaog

You're right. So, should we query DeviceClasses directly there if draManager == nil? We can't use ExtendedResourceCache from Kubelet, can we?

bart0sh avatar Dec 15 '25 10:12 bart0sh

I think I found the way to check it: if pod.Status.ExtendedResourceClaimStatus is not nil and the resource and container name from the RequestMappings match resource and container we can say that this resource is handled by DRA. I'll add this check to the removeMissingExtendedResources function.

bart0sh avatar Dec 15 '25 10:12 bart0sh