node-feature-discovery Exposing Hardware Topology through CRDs in Node feature discovery

Currently Node feature Discovery consists of nfd-master and nfd-worker. The former is responsible for labeling Kubernetes node objects and the latter is responsible for detecting features and communicating them to nfd-master. nfd-worker runs as a daemon-set on all the nodes in the cluster.

Resource Topology Exporter (KEP, Code) can nicely fit into NFD architecture by:

enabling nfd-worker to gather node resource topology information
passing this information to the nfd-master over grpc
Ultimately, the nfd-master is responsible for creating CRD instances for all the worker nodes in the cluster.

Design document explaining the approach and proposed changes are provided in detail here.

Possible implementation approaches of introducing Resource Topology Exporter in NFD operand are summarized below:

Option 1:Introducing Resource Topology Exporter as a source or a new command line option in the nfd-worker. New source (or new command line option) is where nfd-worker would gather hardware topology information. Nfd-master would update NumaTopology CRD
Option 2 (preferred): A new helper daemon eg nfd-node-topology (similar to nfd-worker).

The new helper gathers hardware topology information and sends to nfd-master over gRPC through another endpoint.
The communication between nfd-node-topology between nfd-master is over the separate endpoint which contains hardware topology information but can be easily enhanced to other CRDs
Nfd-master would update NumaTopology CRD using NodeTopologyRequest that it receives from nfd-worker
NOTE: NFD operator would have to be enhanced to cater to nfd-node-topology (like it manages nfd-master and nfd-worker)
The advantage of this approach is that nfd-worker and nfd-master continue to work the same way as they currently do

Changes to the gRPC interface: Introducing another gRPC endpoint for communication between nfd-worker and nfd-master. (completely separate proto as shown below)

syntax = "proto3";

option go_package = "sigs.k8s.io/node-feature-discovery/pkg/topologyupdater";
import "github.com/k8stopologyawareschedwg/noderesourcetopology-api/pkg/apis/topology/v1alpha1/generated.proto";

package topologyupdater;

service NodeTopology{
    rpc UpdateNodeTopology(NodeTopologyRequest) returns (NodeTopologyResponse);
}
message NodeTopologyRequest {
    string nfd_version = 1;
    string node_name = 2;
    repeated string topology_policies = 3;
    repeated v1alpha1.Zone zones = 4;
}
message NodeTopologyResponse {
}

Additionally, we propose that NFD becomes home for NodeResourceTopology CRD API definition (and informer, clientset, handlers etc)

Aug 10 '20 14:08 swatisehgal

I'd suggest to use a more generic name for the package, "cr-updater" or something.

Regarding the data/CRD, I was pondering should we prepare for more detailed information, e.g. bake in or make it possible to present numa distances(?)

Aug 12 '20 14:08 marquiz

@marquiz Sure, we can make the package name more generic. Can you elaborate on how do you see information related to numa distances being exposed here? In order to gather resource information out of tree we are currently relying on podResource API. As our only source of truth for obtaining device information is (device manager in) kubelet, we have device information in kubelet (health, device ids and numa node as part of topologyInfo) but need to extend podResource API (as proposed here: kubernetes/enhancements#1884) . Numa distances cannot be obtained from kubelet as it is not supported currently in kubelet.

As per the design of topology manager, cpu manager and device manager implement the hint providers interface and provide numa node id as a hint to topology manager. We are open to suggestion here, but as a starting point we could expose the constructs in terms of numa node ids (as it aligns with the current design of topology manager) and more importantly that's the only information at our disposal currently. We would be more than happy to update later when topology information becomes more descriptive in topology manager as the API itself would evolve and would allow us to expose this information. Let me know your thoughts.

Aug 13 '20 10:08 swatisehgal

A way we can expose the numa distance could probably be like

message NUMANodeResource {
    int numa_id = 1;
    map<string, string> resources = 2;
    map<int, int> distances = 3;
}

However I second what @swatisehgal wrote, the thing is we were using the podresources API - and the extensions we were proposing here - as source of truth. However, if there is interest specifically about the numa distances we can maybe learn them reliably from sysfs, kinda like numactl does. That should be pretty safe to reconcile. It could require access to more system resources though.

Aug 13 '20 14:08 ffromani

Thanks @fromanirh for this code snippet. Getting numa node distance from sysfs is totally doable and populating this as part of NUMANodeResource makes sense! I was considering numa node distance from the point of view of resources and their distance from each other (in my previous comment).

Aug 13 '20 15:08 swatisehgal

I don't have the answers myself, either, just throwing some ideas 😊 I was just trying to think a bit ahead and broader and not "cement" this too tightly just for this one use case and scheduler extension. When we add something we should at least try to think about possible other users and usages, too, e.g. some other scheduler extensions and possibly alternative rte daemons digging more detailed data. Just trying to avoid the situation where we quickly patch together the api for one narrow use case and after a while we're stuck with that and have problems serving new users. This case is simple, but still. Maybe we have something to learn from how the Linux kernel handles apis 😄

Aug 13 '20 18:08 marquiz

@marquiz Thanks for the input. I see your point and have updated the design document to include NumaNode distances as part of the NumaNodeResource structure

Aug 18 '20 18:08 swatisehgal

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Nov 16 '20 19:11 fejta-bot

/remove-lifecycle stale

Nov 18 '20 18:11 swatisehgal

Development work is in progress to support this feature. PR will be linked here soon!

Nov 18 '20 18:11 swatisehgal

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Feb 16 '21 18:02 fejta-bot

/remove-lifecycle stale

Feb 16 '21 18:02 zvonkok

No Task Break Down PR Owner Status

1. NFD Topology Updater Implementation https://github.com/kubernetes-sigs/node-feature-discovery/pull/525 @swatisehgal Merged

2. NFD Topology Updater Documentation https://github.com/kubernetes-sigs/node-feature-discovery/pull/526 @swatisehgal Merged

3. NFD Topology Updater E2E tests https://github.com/kubernetes-sigs/node-feature-discovery/pull/528 @fromanirh

4. Enable Configz Endpoint for obtaining Kubelet config https://github.com/kubernetes-sigs/node-feature-discovery/pull/530 @Tal-or

5. Allow excluding of specific resource accounting from NRT's objects https://github.com/kubernetes-sigs/node-feature-discovery/pull/545 @Tal-or

6. NFD Topology Updater: Add memory accounting in NRT https://github.com/kubernetes-sigs/node-feature-discovery/pull/593 @cynepco3hahue Merged

7. Simplify accounting logic in NFD-topology Updater TBD @fromanirh

8. NFD-topology Updater Helm deployment if topologyupdater explicitly enabled https://github.com/kubernetes-sigs/node-feature-discovery/pull/623 @zwpaper Merged

9. Event based NFD-topology Updater: Ensure that CRs are updated on every deviceplugin and/or pod creation/deletion event as opposed to the current timer based approach TBD TBD

No	Task Break Down	PR	Owner	Status
1.	NFD Topology Updater Implementation	https://github.com/kubernetes-sigs/node-feature-discovery/pull/525	@swatisehgal	Merged
2.	NFD Topology Updater Documentation	https://github.com/kubernetes-sigs/node-feature-discovery/pull/526	@swatisehgal	Merged
3.	NFD Topology Updater E2E tests	https://github.com/kubernetes-sigs/node-feature-discovery/pull/528	@fromanirh
4.	Enable Configz Endpoint for obtaining Kubelet config	https://github.com/kubernetes-sigs/node-feature-discovery/pull/530	@Tal-or
5.	Allow excluding of specific resource accounting from NRT's objects	https://github.com/kubernetes-sigs/node-feature-discovery/pull/545	@Tal-or
6.	NFD Topology Updater: Add memory accounting in NRT	https://github.com/kubernetes-sigs/node-feature-discovery/pull/593	@cynepco3hahue	Merged
7.	Simplify accounting logic in NFD-topology Updater	TBD	@fromanirh
8.	NFD-topology Updater Helm deployment if topologyupdater explicitly enabled	https://github.com/kubernetes-sigs/node-feature-discovery/pull/623	@zwpaper	Merged
9.	Event based NFD-topology Updater: Ensure that CRs are updated on every deviceplugin and/or pod creation/deletion event as opposed to the current timer based approach	TBD	TBD

May 13 '21 09:05 swatisehgal

Expect the missing two PRs by monday

May 14 '21 06:05 ffromani

No Task Break Down PR Owner

1. NFD Topology Updater Implementation https://github.com/kubernetes-sigs/node-feature-discovery/pull/525 @swatisehgal

2. NFD Topology Updater Documentation https://github.com/kubernetes-sigs/node-feature-discovery/pull/526 @swatisehgal

3. NFD Topology Updater E2E tests https://github.com/kubernetes-sigs/node-feature-discovery/pull/528 @fromanirh

4. Enable Configz Endpoint for obtaining Kubelet config https://github.com/kubernetes-sigs/node-feature-discovery/pull/530 @Tal-or

No	Task Break Down	PR	Owner
1.	NFD Topology Updater Implementation	https://github.com/kubernetes-sigs/node-feature-discovery/pull/525	@swatisehgal
2.	NFD Topology Updater Documentation	https://github.com/kubernetes-sigs/node-feature-discovery/pull/526	@swatisehgal
3.	NFD Topology Updater E2E tests	https://github.com/kubernetes-sigs/node-feature-discovery/pull/528	@fromanirh
4.	Enable Configz Endpoint for obtaining Kubelet config	https://github.com/kubernetes-sigs/node-feature-discovery/pull/530	@Tal-or

May 18 '21 09:05 ffromani

/label kind/feature

Aug 11 '21 21:08 ArangoGutierrez

@ArangoGutierrez: The label(s) /label kind/feature cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda

In response to this:

/label kind/feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Aug 11 '21 21:08 k8s-ci-robot

/kind feature

Aug 11 '21 21:08 ArangoGutierrez

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 07 '22 14:02 k8s-triage-robot

/remove-lifecycle stale

Feb 07 '22 14:02 ffromani

Hey all, any updates on this feature set?

Mar 23 '22 22:03 ArangoGutierrez

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 21 '22 22:06 k8s-triage-robot

There is progress lately in e2e-tests and /configz endpoint

/remove-lifecycle stale

Jul 08 '22 12:07 marquiz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 06 '22 13:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Nov 05 '22 13:11 k8s-triage-robot

This is still alive and tracking /remove-lifecycle rotten

Nov 08 '22 13:11 marquiz

@ffromani @Tal-or @swatisehgal are we expecting new PRs here or should we close this issue? We could also track the missing pieces as separate issues

Jan 20 '23 08:01 marquiz

@ffromani @PiotrProkop @Tal-or @swatisehgal should we close this issue as implemented and track any future enhancements in separate issues?

Mar 03 '23 11:03 marquiz

@ffromani @PiotrProkop @Tal-or @swatisehgal should we close this issue as implemented and track any future enhancements in separate issues?

From my perspective yes, we're now in a pretty good shape and we can track future enhancements with separate issues.

Mar 03 '23 11:03 ffromani

@ffromani @PiotrProkop @Tal-or @swatisehgal should we close this issue as implemented and track any future enhancements in separate issues?

+1. Nice to see that we have come a long way :) Let's close this issue. Thanks!

Mar 03 '23 12:03 swatisehgal

Closing as suggested. Thanks for everybody involved for working on this 🎉

/close

Mar 03 '23 13:03 marquiz

node-feature-discovery node-feature-discovery copied to clipboard

Exposing Hardware Topology through CRDs in Node feature discovery

node-feature-discovery
node-feature-discovery copied to clipboard