node-feature-discovery nfd-topology-updater missed some cpu for Guaranteed pods

What happened: nfd-topology-updater is used for report node's NUMA and Guaranteed pods's cpu used. But it missed cpu for Guaranteed pod which pod's have any no exclusiveCPU.

Node yj-kubevirtwork-001 kubelet config single-numa-node and cpu-manager

--cpu-manager-policy=static--topology-manager-policy=single-numa-node

I have two pods in node:

root@yj-kubevirtmaster-01 ~ # kubectl get po -owide nginx-deployment-68589dd8dc-kzvzh virt-launcher-kubevirttesttest1-p545b
NAME                                    READY   STATUS    RESTARTS   AGE     IP              NODE                  NOMINATED NODE   READINESS GATES
nginx-deployment-68589dd8dc-kzvzh       1/1     Running   0          2d16h   10.106.66.199   yj-kubevirtwork-001   <none>           <none>
virt-launcher-kubevirttesttest1-p545b   3/3     Running   0          4h20m   10.106.66.177   yj-kubevirtwork-001   <none>           1/1

In cpu-manager allocated cpu in node

cat /var/lib/kubelet/cpu_manager_state | jq
{
  "policyName": "static",
  "defaultCpuSet": "1,3,24-25,48-51,53,55,76-77,100-103",
  "entries": {
    "609526ad-0de0-4543-8188-48f872576fc9": {
      "compute": "26-34,78-86"
    },
    "64dc405a-1a9c-4029-ba5f-778fe5948a10": {
      "compute": "0,2,4-9,52,54,56-61"
    },
    "8310086e-bc8b-4ab3-ac6e-ccc1129fecec": {
      "compute": "35-47,87-99"
    },
    "98f43841-5488-49d7-9f79-29fa68791316": {
      "nginx": "14-15,66-67"
    },
    "b4fd6f12-9cf5-459e-8fdb-3058a2ae22a3": {
      "compute": "10-13,16-23,62-65,68-75"
    }
  },
  "checksum": 2485504922
}

But in noderesourcetopologies CR, it only report nginx's CPU set, not report kubevirt-launcher's CPU

kubectl get noderesourcetopologies yj-kubevirtwork-001 -oyaml
apiVersion: topology.node.k8s.io/v1alpha2
attributes:
- name: topologyManagerPolicy
  value: single-numa-node
- name: topologyManagerScope
  value: container
- name: nodeTopologyPodsFingerprint
  value: pfp0v001828baea44880d152
kind: NodeResourceTopology
metadata:
  creationTimestamp: "2024-12-05T13:13:40Z"
  generation: 66528
  name: yj-kubevirtwork-001
  ownerReferences:
  - apiVersion: v1
    kind: Namespace
    name: node-feature-discovery
    uid: 4072a5ce-0634-447e-ab31-dd3f4bc7abac
  resourceVersion: "27862374"
  uid: dd26aaf4-3f38-4708-bb86-89305475a715
topologyPolicies:
- SingleNUMANodeContainerLevel
zones:
- costs:
  - name: node-0
    value: 21
  - name: node-1
    value: 10
  name: node-1
  resources:
  - allocatable: "52"
    available: "52"
    capacity: "52"
    name: cpu
  - allocatable: "201847795712"
    available: "157778243583"
    capacity: "201847795712"
    name: memory
  type: Node
- costs:
  - name: node-0
    value: 10
  - name: node-1
    value: 21
  name: node-0
  resources:
  - allocatable: "50"
    available: "46"
    capacity: "52"
    name: cpu
  - allocatable: "183845359616"
    available: "130691427327"
    capacity: "201130086400"
    name: memory
  type: Node

In nfd-topology-updater logs:

I1213 06:06:41.791096       1 nfd-topology-updater.go:235] "received updated pod resources" podResources=<
	- Containers:
	  - Name: nginx
	    Resources:
	    - Data:
	      - "66"
	      - "67"
	      - "14"
	      - "15"
	      Name: cpu
	      NumaNodeIds: null
	    - Data:
	      - "8589934592"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  Name: nginx-deployment-68589dd8dc-kzvzh
	  Namespace: default
	- Containers:
	  - Name: compute
	    Resources:
	    - Data:
	      - vhost-net172
	      Name: devices.kubevirt.io/vhost-net
	      NumaNodeIds: null
	    - Data:
	      - kvm760
	      Name: devices.kubevirt.io/kvm
	      NumaNodeIds: null
	    - Data:
	      - tun96
	      Name: devices.kubevirt.io/tun
	      NumaNodeIds: null
	    - Data:
	      - "9141485568"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  - Name: volumecontainerdisk
	    Resources:
	    - Data:
	      - "40000000"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  - Name: guest-console-log
	    Resources:
	    - Data:
	      - "60000000"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  Name: virt-launcher-kubevirttesttest2-8dnfj
	  Namespace: default
 >

nfd-topology-updater not got kubevirt-launcher pod's compute container's CPU.

What you expected to happen: nfd-topology-updater can report all pod container's exclusiveCPU.

How to reproduce it (as minimally and precisely as possible): For example, create a pod Pod-a have 2 containers:

containers:
- name: ctr-a
  resources:
    limits:
      cpu: 5   
    requests:
      cpu: 5
- name: ctr-b
  resources:
    limits:
      cpu: 500m
    requests:
      cpu: 500m

pod is still a Guaranteed pod , and allocate exclusiveCPU by kubelet's cpu-manager, but nfd-topology-updater will not report it's CPU in topology CR.

Anything else we need to know?: I thought this bug is cause by hasExclusiveCPUs func:

https://github.com/kubernetes-sigs/node-feature-discovery/blob/141607269983adbf13f7f84b3f02314e49c3d278/pkg/resourcemonitor/podresourcesscanner.go#L78-L107

if pod has any container not request cpu as integer, it will return false.

https://github.com/kubernetes-sigs/node-feature-discovery/blob/141607269983adbf13f7f84b3f02314e49c3d278/pkg/resourcemonitor/podresourcesscanner.go#L168-L183

It cause nfd-topology-updater skip this pod's all containers loop for report CPU.

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

Dec 13 '24 06:12 AllenXu93

@AllenXu93 it is a deliberate decision/feature to only count exclusively allocated CPUs (of Guaranteed pods). Could you open up your usage scenario(s) a bit for counting also cpus in the shared pool? Maybe we could have a config option/cmdline flag to enable this.

@PiotrProkop what are your thoughts on this?

EDIT: @AllenXu93 see https://kubernetes-sigs.github.io/node-feature-discovery/stable/usage/nfd-topology-updater.html

Dec 13 '24 09:12 marquiz

@AllenXu93 it is a deliberate decision/feature to only count exclusively allocated CPUs (of Guaranteed pods). Could you open up your usage scenario(s) a bit for counting also cpus in the shared pool? Maybe we could have a config option/cmdline flag to enable this.

@PiotrProkop what are your thoughts on this?

EDIT: @AllenXu93 see https://kubernetes-sigs.github.io/node-feature-discovery/stable/usage/nfd-topology-updater.html

This PR is still use exclusively allocated CPUs, it doesn't change. But exclusively allocated CPUs is not mean pod's all container has exclusively CPUs. For example, if pod has two containers, one is 2 Core CPU request and limits, one is 500m CPU request and limits, the pod is still Guaranteed pods, and the first container is exclusively CPU by expected. But now topo-updater will skip this pod's CPU report.

Dec 16 '24 02:12 AllenXu93

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.

I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).

WDYT @PiotrProkop @ffromani, something we're missing here?

Dec 16 '24 08:12 marquiz

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.

I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).

WDYT @PiotrProkop @ffromani, something we're missing here?

I think there's a good point here. Need to review the logic and count all the exclusively allocated CPUs

Dec 16 '24 09:12 ffromani

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.

I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).

WDYT @PiotrProkop @ffromani, something we're missing here?

In fact, I use kubevirt to manager vm by k8s, all the kubevirt pods created this way. VM container can allocate Integral CPUs, there are many other containers and init-containers only request 200m CPU. But the VM container will allocate CPU in one NUMA if kubelet open single-numa-node, nfd-topology-updater miss them will cause error when schedule other pods .

Dec 16 '24 09:12 AllenXu93

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail. I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus). WDYT @PiotrProkop @ffromani, something we're missing here?

In fact, I use kubevirt to manager vm by k8s, all the kubevirt pods created this way. VM container can allocate Integral CPUs, there are many other containers and init-containers only request 200m CPU. But the VM container will allocate CPU in one NUMA if kubelet open single-numa-node, nfd-topology-updater miss them will cause error when schedule other pods .

I'm a bit unsure about the QoS here and I need to review (again) the rules, but I totally agree that all the containers which have exclusive CPUs allocated in guaranteed QoS pods should be reported

Dec 16 '24 09:12 ffromani

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail. I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus). WDYT @PiotrProkop @ffromani, something we're missing here?

In fact, I use kubevirt to manager vm by k8s, all the kubevirt pods created this way. VM container can allocate Integral CPUs, there are many other containers and init-containers only request 200m CPU. But the VM container will allocate CPU in one NUMA if kubelet open single-numa-node, nfd-topology-updater miss them will cause error when schedule other pods .

I'm a bit unsure about the QoS here and I need to review (again) the rules, but I totally agree that all the containers which have exclusive CPUs allocated in guaranteed QoS pods should be reported

Of course, I learn from the docs https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed

In my opinion: Guaranteed pod is For every Container in the Pod, all have CPU / Memory limit and request, limit and request should be equal.

Dec 16 '24 10:12 AllenXu93

@AllenXu93 you made good points, there could be an actual bug in the area you identified. We can use this issue to add more unit tests. I'll try to have a look ASAP but europe holiday season is incoming. Others are welcome to chime in, I'll surely review.

Dec 16 '24 10:12 ffromani

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Mar 16 '25 10:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Apr 15 '25 11:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

May 15 '25 11:05 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

May 15 '25 11:05 k8s-ci-robot

I think this issue is still valid/relevant

/reopen /remove-lifecycle rotten

May 15 '25 12:05 marquiz

@marquiz: Reopened this issue.

In response to this:

I think this issue is still valid/relevant

/reopen /remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

May 15 '25 12:05 k8s-ci-robot