node-feature-discovery icon indicating copy to clipboard operation
node-feature-discovery copied to clipboard

NFD does not reveal all the features in cpuid for AMD Ryzen

Open ionutnechita opened this issue 2 years ago • 3 comments

Hi NFD Team,

I try today to deploy NFD in Kind Cluster. But I noticed that there are differences between what NFD and cpuid discover. There are fewer characteristics discovered by NFD compared to cpuid.

NFD version: k8s.gcr.io/nfd/node-feature-discovery:v0.11.1 CPUID go implementation version: cpuid-Linux_x86_64_2.0.14 OS: openSUSE Tumbleweed 20220706 Kind version: kind version 0.14.0 AMD CPU: AMD Ryzen 9 5900HS with Radeon Graphics

cpuid: cpuid-go.txt nfd labels + workers: kubectl-nodes.txt

nfd features: features-nfd.txt cpuid features: features-cpuid.txt

Diff: BMI1 < BMI2 < CLMUL < CMOV < CX16 < ERMS < F16C < HTT < LZCNT < MMX < MMXEXT < MSR_PAGEFLUSH < NX < POPCNT < RDRAND < RDSEED < RDTSCP < SEV < SEV_64BIT < SEV_ALTERNATIVE < SEV_DEBUGSWAP < SEV_ES < SEV_RESTRICTED < SME < SSE < SSE2 < SSE3 < SSE4 < SSE42 < SSSE3 < VTE < XGETBV1 < XSAVEC < XSAVEOPT < XSAVES <

BR, Ionut Nechita ionutnechita

ionutnechita avatar Jul 09 '22 13:07 ionutnechita

Hi @ionutnechita By default those cpu features are disabled from publishing via attributeBlacklist. If you want to enable them, I would suggest you either to edit your configMap (namespace: node-feature-discovery, name: nfd-worker-conflive if you are using default manifests from NFD) to uncomment attributeBlacklist so that the list is empty or do the same changes in the default worker configuration file used for that configMap. Example of config file would look like

    #core:
    #  labelWhiteList:
    #  noPublish: false
    #  sleepInterval: 60s
    #  featureSources: [all]
    #  labelSources: [all]
    #  klog:
    #    addDirHeader: false
    #    alsologtostderr: false
    #    logBacktraceAt:
    #    logtostderr: true
    #    skipHeaders: false
    #    stderrthreshold: 2
    #    v: 0
    #    vmodule:
    ##   NOTE: the following options are not dynamically run-time configurable
    ##         and require a nfd-worker restart to take effect after being changed
    #    logDir:
    #    logFile:
    #    logFileMaxSize: 1800
    #    skipLogHeaders: false
    sources:
      cpu:
        cpuid:
          attributeBlacklist:
    #  kernel:
    #    kconfigFile: "/path/to/kconfig"
    #    configOpts:
    #      - "NO_HZ"
    #      - "X86"
    #      - "DMI"

Ref: https://kubernetes-sigs.github.io/node-feature-discovery/v0.11/advanced/worker-configuration-reference.html#sourcescpu

fmuyassarov avatar Sep 02 '22 09:09 fmuyassarov

Example of labels set on my minikube node after removing default blacklisted features

$ kubectl get nodes minikube-m02 -ojson | jq .metadata.labels
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/os": "linux",
  "feature.node.kubernetes.io/cpu-cpuid.ADX": "true",
  "feature.node.kubernetes.io/cpu-cpuid.AESNI": "true",
  "feature.node.kubernetes.io/cpu-cpuid.AVX": "true",
  "feature.node.kubernetes.io/cpu-cpuid.AVX2": "true",
  "feature.node.kubernetes.io/cpu-cpuid.BMI1": "true",
  "feature.node.kubernetes.io/cpu-cpuid.BMI2": "true",
  "feature.node.kubernetes.io/cpu-cpuid.CLMUL": "true",
  "feature.node.kubernetes.io/cpu-cpuid.CMOV": "true",
  "feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8": "true",
  "feature.node.kubernetes.io/cpu-cpuid.CX16": "true",
  "feature.node.kubernetes.io/cpu-cpuid.ERMS": "true",
  "feature.node.kubernetes.io/cpu-cpuid.F16C": "true",
  "feature.node.kubernetes.io/cpu-cpuid.FMA3": "true",
  "feature.node.kubernetes.io/cpu-cpuid.FXSR": "true",
  "feature.node.kubernetes.io/cpu-cpuid.FXSROPT": "true",
  "feature.node.kubernetes.io/cpu-cpuid.HTT": "true",
  "feature.node.kubernetes.io/cpu-cpuid.IBPB": "true",
  "feature.node.kubernetes.io/cpu-cpuid.LAHF": "true",
  "feature.node.kubernetes.io/cpu-cpuid.LZCNT": "true",
  "feature.node.kubernetes.io/cpu-cpuid.MMX": "true",
  "feature.node.kubernetes.io/cpu-cpuid.MMXEXT": "true",
  "feature.node.kubernetes.io/cpu-cpuid.MOVBE": "true",
  "feature.node.kubernetes.io/cpu-cpuid.MPX": "true",
  "feature.node.kubernetes.io/cpu-cpuid.NX": "true",
  "feature.node.kubernetes.io/cpu-cpuid.OSXSAVE": "true",
  "feature.node.kubernetes.io/cpu-cpuid.POPCNT": "true",
  "feature.node.kubernetes.io/cpu-cpuid.RDRAND": "true",
  "feature.node.kubernetes.io/cpu-cpuid.RDSEED": "true",
  "feature.node.kubernetes.io/cpu-cpuid.RDTSCP": "true",
  "feature.node.kubernetes.io/cpu-cpuid.RTM_ALWAYS_ABORT": "true",
  "feature.node.kubernetes.io/cpu-cpuid.SCE": "true",
  "feature.node.kubernetes.io/cpu-cpuid.SGX": "true",
  "feature.node.kubernetes.io/cpu-cpuid.SSE": "true",
  "feature.node.kubernetes.io/cpu-cpuid.SSE2": "true",
  "feature.node.kubernetes.io/cpu-cpuid.SSE3": "true",
  "feature.node.kubernetes.io/cpu-cpuid.SSE4": "true",
  "feature.node.kubernetes.io/cpu-cpuid.SSE42": "true",
  "feature.node.kubernetes.io/cpu-cpuid.SSSE3": "true",
  "feature.node.kubernetes.io/cpu-cpuid.STIBP": "true",
  "feature.node.kubernetes.io/cpu-cpuid.VMX": "true",
  "feature.node.kubernetes.io/cpu-cpuid.X87": "true",
  "feature.node.kubernetes.io/cpu-cpuid.XGETBV1": "true",
  "feature.node.kubernetes.io/cpu-cpuid.XSAVE": "true",
  "feature.node.kubernetes.io/cpu-cpuid.XSAVEC": "true",
  "feature.node.kubernetes.io/cpu-cpuid.XSAVEOPT": "true",
  "feature.node.kubernetes.io/cpu-cpuid.XSAVES": "true",
  "feature.node.kubernetes.io/cpu-cstate.enabled": "true",
  "feature.node.kubernetes.io/cpu-hardware_multithreading": "true",
  "feature.node.kubernetes.io/cpu-model.family": "6",
  "feature.node.kubernetes.io/cpu-model.id": "142",
  "feature.node.kubernetes.io/cpu-model.vendor_id": "Intel",
  "feature.node.kubernetes.io/cpu-pstate.scaling_governor": "powersave",
  "feature.node.kubernetes.io/cpu-pstate.status": "active",
  "feature.node.kubernetes.io/cpu-pstate.turbo": "true",
  "feature.node.kubernetes.io/kernel-version.full": "5.15.0-46-generic",
  "feature.node.kubernetes.io/kernel-version.major": "5",
  "feature.node.kubernetes.io/kernel-version.minor": "15",
  "feature.node.kubernetes.io/kernel-version.revision": "0",
  "feature.node.kubernetes.io/pci-0300_8086.present": "true",
  "feature.node.kubernetes.io/storage-nonrotationaldisk": "true",
  "feature.node.kubernetes.io/system-os_release.ID": "ubuntu",
  "feature.node.kubernetes.io/system-os_release.VERSION_ID": "20.04",
  "feature.node.kubernetes.io/system-os_release.VERSION_ID.major": "20",
  "feature.node.kubernetes.io/system-os_release.VERSION_ID.minor": "04",
  "feature.node.kubernetes.io/usb-ef_13d3_5694.present": "true",
  "feature.node.kubernetes.io/usb-ff_04f3_0903.present": "true",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "minikube-m02",
  "kubernetes.io/os": "linux"
}

fmuyassarov avatar Sep 02 '22 10:09 fmuyassarov

Hi @ionutnechita. Is this issue still relevant or can we close it ?

fmuyassarov avatar Sep 26 '22 20:09 fmuyassarov

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 25 '22 20:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 24 '23 21:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Feb 23 '23 21:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 23 '23 21:02 k8s-ci-robot