node-feature-discovery
node-feature-discovery copied to clipboard
nfd-topology-updater: Detect E/P cores and expose through attributes
Detect which CPUs are which types of the cores (P-cores or E-cores) and expose IDs through labels.
Hi @ozhuraki. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Deploy Preview for kubernetes-sigs-nfd ready!
| Name | Link |
|---|---|
| Latest commit | 2946157c6f7d8903307da737bc5a6c48172891be |
| Latest deploy log | https://app.netlify.com/sites/kubernetes-sigs-nfd/deploys/67599acefb726600082bb9a9 |
| Deploy Preview | https://deploy-preview-1737--kubernetes-sigs-nfd.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
I have several concerns regarding this PR:
- way of detecting cores. We should use
/sys/devices/cpu_*/*cpusfor getting information about P/E cores. not something else. - Those subdirs
cpu_*are present only on hybrid CPUs, so if we have only one subdir/sys/devices/cpu/, it means only one type of the cores exists in the package, thus no need to expose those attributes at all. In other words: labels should be exposed only if CPU is hybrid. - Even on hybrid CPUs, those labels should not be by default populated, only when configuration requests that. I agree with Markus that those labels are not directly consumable, so extra step to enable them on hybrid CPUs is fine.
- Exposing it as attributes in topology exporter is fine. There we can put them without configuration option if running on hybrid CPUs.
- Names of the types of the cores should not be hard-coded, but taken from sysfs. e.g.
$ grep "" /sys/devices/cpu_*/*cpus
/sys/devices/cpu_atom/cpus:16-19
/sys/devices/cpu_core/cpus:0-15
$
so, the labels should have "foo..atom" and "foo...core" names. @marquiz please suggest some good prefix instead of "foo..."
@marquiz @kad
Thanks for the helpful input. Updated, please take a look
@marquiz
Thanks for the useful input! Updated, please take a look
@marquiz @kad
Thanks for the help! Updated, please take a look
@marquiz
Thanks, updated, please take a look
@marquiz
Thanks, updated, please take a look
/retest
One question and sorry if I misunderstand how hybrid cpus can be consumed in K8s. Wouldn't be more useful to just modify ResourceAggregator to advertise new resources for each cpu type in each Zone (NUMA node) by mapping generic k8s cpu resource to each cpu_type? But I may be wrong as I don't know how currently those E/P cores are assigned and consumed in Kubernetes.
@PiotrProkop
There's nothing special about it, just one way of exposing such information.
We can also modify ResourceAggregator to advertise new resources for each cpu type too.
I will make an issue and will try to add it with a sepatate PR.
I just don't know how this information provided via attributes can be useful to custom scheduler without mapping to already used cpus. NRT object only exposes info about allocated/free cpus in given Zone without an info which cpu id is used and which is free.
@PiotrProkop
Thanks and sorry, missed you comment.
Yes, it's merely exposing an information for particuar kind of cores for workloads that just want particular kind of cores on hybrid cpus.
I made an issue #1964 to expose it through ResourceAggregator (so the custom scheduler could track allocated/free cores) and will try to add such PR too.
@uniemimu
Thanks, updated.
I still feel like this information should be then exposed via labels and just say that given node has e or p cores without the numbers. And then no special scheduler plugin is needed but I may be wrong on what's the usecase here. I'll ask @ffromani to chime in.
@PiotrProkop
We initially had it through labels but later switched to annotations since labels were restrictive on exposing IDs.
I can additionaly expose via labels and just say that given node has E or P, as you suggest.
@PiotrProkop
Added labels too, please take a look.
@ozhuraki: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-node-feature-discovery-build-image-cross-generic | 2946157c6f7d8903307da737bc5a6c48172891be | link | true | /test pull-node-feature-discovery-build-image-cross-generic |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
/cc
got the ping, will review ASAP
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: marquiz, ozhuraki, uniemimu
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [marquiz]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle rotten - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
@ozhuraki, can you rebase so we can revive this initiative?
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle rotten - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten