kepler icon indicating copy to clipboard operation
kepler copied to clipboard

No energy usage metrics for isolated CPU cores.

Open rszmigiel opened this issue 1 year ago • 14 comments

What happened?

I'm running PoC with OpenShift 4.13, Kepler 0.9.2 installed with Kepler (Community) Operator. One of the use-cases is to visualise energy consumption of DPDK enabled containers. These containers are using isolated CPU cores on a SingleNodeOpenShift installation.

I got CPUs isolated with the following PerformanceProfile:

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: openshift-node-performance-profile
spec:
  additionalKernelArgs:
  - "rcupdate.rcu_normal_after_boot=0"
  - "efi=runtime"
  - "module_blacklist=irdma"
  cpu:
    isolated: "4-31,36-63,68-95,100-127"
    reserved: "0-3,32-35,64-67,96-99"
  hugepages:
    defaultHugepagesSize: 1G
    pages:
      - count: 64
        size: 1G
        node: 0
      - count: 64
        size: 1G
        node: 1
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/master: ""
  nodeSelector:
    node-role.kubernetes.io/master: ''
  numa:
    topologyPolicy: single-numa-node
  realTimeKernel:
    enabled: false
  workloadHints:
    realTime: false
    highPowerConsumption: false
    perPodPowerManagement: false

I also got workload partitioning configured (https://docs.openshift.com/container-platform/4.13/scalability_and_performance/enabling-workload-partitioning.html).

While I run a Pod that's configured to use isolated CPU cores, for an instance:

        resources:
          limits:
            memory: "24Gi"
            cpu: "50"
            hugepages-1Gi: 24Gi
          requests:
            memory: "24Gi"
            cpu: "50"
            hugepages-1Gi: 24Gi

and then run a sample workload to put some load on these cores, for an instance:

stress-ng --cpu 50 --io 2 --vm 50 --vm-bytes 1G --timeout 10m --metrics-brief

I can observe that assigned CPU cores shows high usage in top output: image

but Kepler's power usage diagrams don't reflect that - they're very flat: image

However, if I run the same Pod but on shared (non-isolated) CPU cores by removing whole resources.requests and resources.limits sections, the Kepler graphs looks much more reasonable: image

even the workload is running on small portion of non-isolated CPU cores: image

Therefore I conclude that Kepler does not show proper power usage when isolated CPU cores are being used.

What did you expect to happen?

I'd like to see energy usage for isolated and non-isolated CPU cores. This is very important for all high throughput, low latency workloads.

How can we reproduce it (as minimally and precisely as possible)?

Get OpenShift 4.13 with Kepler Community Operator installed, configure node to run isolated cpu cores and workloads isolation. Run two pods, one using isolated CPU cores, one using shared CPU cores. Observe that energy usage metrics are being collected only for shared (non-isolated) CPU cores.

Anything else we need to know?

No response

Kepler image tag

quay.io/sustainable_computing_io/kepler-operator@sha256:ea9ec43b407a918efdaf4a0c8c7bba73cb04f35fe1b2169065926ae8c637e327

Kubernetes version

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"0c63f9da2694c080257111616c60005f32a5bf47", GitTreeState:"clean", BuildDate:"2023-10-20T23:17:10Z", GoVersion:"go1.20.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.9+636f2be", GitCommit:"e782f8ba0e57d260867ea108b671c94844780ef2", GitTreeState:"clean", BuildDate:"2023-10-20T19:28:29Z", GoVersion:"go1.19.13 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

$ oc version
Client Version: 4.14.1
Kustomize Version: v5.0.1
Server Version: 4.13.21
Kubernetes Version: v1.26.9+636f2be

Cloud provider or bare metal

Baremetal SingleNodeOpenShift

OS version

# cat /etc/os-release
NAME="Red Hat Enterprise Linux CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="413.92.202310210500-0"
VERSION_ID="4.13"
VARIANT="CoreOS"
VARIANT_ID=coreos
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 413.92.202310210500-0 (Plow)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.13/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.13"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.13"
OPENSHIFT_VERSION="4.13"
RHEL_VERSION="9.2"
OSTREE_VERSION="413.92.202310210500-0"

# uname -a
Linux XYZ1 5.14.0-284.36.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 5 08:11:31 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
</details>


### Install tools

<details>

</details>


### Kepler deployment config

<details>

For on kubernetes:
```console
$ oc get cm kepler-exporter-cm -n openshift-kepler-operator -o yaml
apiVersion: v1
data:
  BIND_ADDRESS: 0.0.0.0:9103
  CGROUP_METRICS: '*'
  CPU_ARCH_OVERRIDE: ""
  ENABLE_EBPF_CGROUPID: "true"
  ENABLE_GPU: "true"
  ENABLE_PROCESS_METRICS: "false"
  ENABLE_QAT: "false"
  EXPOSE_CGROUP_METRICS: "true"
  EXPOSE_HW_COUNTER_METRICS: "true"
  EXPOSE_IRQ_COUNTER_METRICS: "true"
  EXPOSE_KUBELET_METRICS: "true"
  KEPLER_LOG_LEVEL: "1"
  KEPLER_NAMESPACE: openshift-kepler-operator
  METRIC_PATH: /metrics
  MODEL_CONFIG: CONTAINER_COMPONENTS_ESTIMATOR=false
  REDFISH_PROBE_INTERVAL_IN_SECONDS: "60"
  REDFISH_SKIP_SSL_VERIFY: "true"
kind: ConfigMap
metadata:
  creationTimestamp: "2024-01-05T00:51:36Z"
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/managed-by: kepler-operator
    app.kubernetes.io/part-of: kepler
    sustainable-computing.io/app: kepler
  name: kepler-exporter-cm
  namespace: openshift-kepler-operator
  ownerReferences:
  - apiVersion: kepler.system.sustainable.computing.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Kepler
    name: kepler
    uid: 1d726dc5-4e3a-4e00-ad82-72c62728b414
  resourceVersion: "18871948"
  uid: 760f221f-3230-41d5-9bcd-3b028132bc9b

$ oc -n openshift-operators describe deployment kepler-operator-controller
Name:                   kepler-operator-controller
Namespace:              openshift-operators
CreationTimestamp:      Fri, 05 Jan 2024 01:50:23 +0100
Labels:                 app.kubernetes.io/component=manager
                        app.kubernetes.io/instance=controller-manager
                        app.kubernetes.io/name=deployment
                        app.kubernetes.io/part-of=kepler-operator
                        olm.deployment-spec-hash=7755955f67
                        olm.owner=kepler-operator.v0.9.2
                        olm.owner.kind=ClusterServiceVersion
                        olm.owner.namespace=openshift-operators
                        operators.coreos.com/kepler-operator.openshift-operators=
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app.kubernetes.io/component=manager,app.kubernetes.io/instance=controller-manager,app.kubernetes.io/part-of=kepler-operator
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/component=manager
                    app.kubernetes.io/instance=controller-manager
                    app.kubernetes.io/part-of=kepler-operator
  Annotations:      alm-examples:
                      [
                        {
                          "apiVersion": "kepler.system.sustainable.computing.io/v1alpha1",
                          "kind": "Kepler",
                          "metadata": {
                            "labels": {
                              "app.kubernetes.io/instance": "kepler",
                              "app.kubernetes.io/name": "kepler",
                              "app.kubernetes.io/part-of": "kepler-operator"
                            },
                            "name": "kepler"
                          },
                          "spec": {
                            "exporter": {
                              "deployment": {
                                "port": 9103
                              }
                            }
                          }
                        }
                      ]
                    capabilities: Basic Install
                    categories: Monitoring
                    containerImage: quay.io/sustainable_computing_io/kepler-operator:0.9.2
                    createdAt: 2023-11-01T12:15:43Z
                    description: Deploys and Manages Kepler on Kubernetes
                    kubectl.kubernetes.io/default-container: manager
                    olm.operatorGroup: global-operators
                    olm.operatorNamespace: openshift-operators
                    olm.targetNamespaces:
                    operatorframework.io/properties:
                      {"properties":[{"type":"olm.gvk","value":{"group":"kepler.system.sustainable.computing.io","kind":"Kepler","version":"v1alpha1"}},{"type":...
                    operators.operatorframework.io/builder: operator-sdk-v1.27.0
                    operators.operatorframework.io/project_layout: go.kubebuilder.io/v3
                    repository: https://github.com/sustainable-computing-io/kepler-operator
  Service Account:  kepler-operator-controller-manager
  Containers:
   manager:
    Image:      quay.io/sustainable_computing_io/kepler-operator:0.9.2
    Port:       8080/TCP
    Host Port:  0/TCP
    Command:
      /manager
    Args:
      --openshift
      --leader-elect
      --kepler.image=$(RELATED_IMAGE_KEPLER)
      --kepler.image.libbpf=$(RELATED_IMAGE_KEPLER_LIBBPF)
      --zap-log-level=5
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:      10m
      memory:   64Mi
    Liveness:   http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:  http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      RELATED_IMAGE_KEPLER:         quay.io/sustainable_computing_io/kepler:release-0.6.1
      RELATED_IMAGE_KEPLER_LIBBPF:  quay.io/sustainable_computing_io/kepler:release-0.6.1-libbpf
      OPERATOR_CONDITION_NAME:      kepler-operator.v0.9.2
    Mounts:                         <none>
  Volumes:                          <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   kepler-operator-controller-5d5767d64f (1/1 replicas created)
Events:          <none>

</details>


### Container runtime (CRI) and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>

rszmigiel avatar Jan 05 '24 09:01 rszmigiel

AFAIK CPU isolation removes a set of CPUs from scheduling algorithm of kernel. Kepler adds a probe to kernel's sched_switch tracepoint to calculate how much cpu time/cpu cycles a process is using, and attributes power usage based on cpu time/cpu cycles of the process. So if some process is using a CPU which is outside of scheduler, then the probe may not be running for that cpu, Kepler may not be knowing the process's cpu time/cycles to assign any power usage to it, and may not be generating metrics for it.

Cc: @rootfs @marceloamaral

vimalk78 avatar Jan 08 '24 11:01 vimalk78

In such case could we obtain power usage metrics using alternative ways, even they're not as much detailed as in case of the eBPF use? For an instance, to workaround the issue mentioned in this case I used output from ipmitool sdr command. It provides summarised power usage across all CPUs and memory installed in the system - still better this than nothing ;-)

rszmigiel avatar Jan 08 '24 11:01 rszmigiel

@rszmigiel would you please use the kepler 0.7.2 container image?

rootfs avatar Jan 08 '24 21:01 rootfs

cc @vprashar2929 @sthaha

rootfs avatar Jan 08 '24 21:01 rootfs

I've used kepler-operator-bundle:0.10.0 and it works!

image

Thank you!

rszmigiel avatar Jan 09 '24 09:01 rszmigiel

great news! thanks for the update @rszmigiel

rootfs avatar Jan 09 '24 13:01 rootfs

@rootfs i am really curious to know why it worked with libbpf but not with bcc. thats the only difference between two kepler versions. the approach to calculate the cpu cycles is same in both.

vimalk78 avatar Jan 09 '24 13:01 vimalk78

It seems it's still happening with latest available version (left side of the graph), compared to 0.7.2 (reinstalled, on the right side)

image

iconeb avatar Aug 27 '24 06:08 iconeb

Reopening issue to continue investigation.

sthaha avatar Aug 28 '24 03:08 sthaha

I tried to reproduce this scenario. In a machine with 20 cores, i isolated 2 cores and executed stress-ng on these isolated cores. Kepler is able to get energy usage for these processes.

Screencast from 2024-08-29 19-11-31.webm

Screenshot from 2024-08-29 19-14-25

vimalk78 avatar Aug 29 '24 13:08 vimalk78

since the cores are isolated, any task started without cpu pinning will not be allocated to the isolated cores. in this case the cpu 2 and 12 will not be loaded

Screencast from 2024-08-29 19-23-31.webm

vimalk78 avatar Aug 29 '24 13:08 vimalk78

Cc: @iconeb PTAL

vimalk78 avatar Sep 01 '24 06:09 vimalk78

I confirm we have a performance profile with reserved and isolated cpus

# oc get performanceprofile upf-performance-profile -o json | jq -r .spec.cpu
{
  "isolated": "2-31,34-63,66-95,98-127",
  "reserved": "0-1,32-33,64-65,96-97"
}

They are correctly applied at worker node's boot

# oc debug node/compute-0.0x4954.openshift.one -- cat /proc/cmdline
[...] intel_iommu=on iommu=pt systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller=1 skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-31,34-63,66-95,98-127 tuned.non_isolcpus=00000003,00000003,00000003,00000003 systemd.cpu_affinity=0,1,32,64,33,65,96,97 intel_iommu=on iommu=pt isolcpus=managed_irq,2-31,34-63,66-95,98-127 nohz_full=2-31,34-63,66-95,98-127 nosoftlockup nmi_watchdog=0 mce=off rcutree.kthread_prio=11 default_hugepagesz=1G hugepagesz=1G hugepages=200 idle=poll rcu_nocb_poll tsc=perfect selinux=0 enforcing=0 noswap clock=pit audit=0 processor.max_cstate=1 intel_idle.max_cstate=0 rcupdate.rcu_normal_after_boot=0 softlockup_panic=0 console=ttyS0,115200n8 pcie_aspm=off pci=noaer firmware_class.path=/var/lib/firmware intel_pstate=disable

Pod is running with requests and limits

$ oc get pod upf1 -o json | jq .spec.containers[0].resources
{
  "limits": {
    "cpu": "18",
    "hugepages-1Gi": "40Gi",
    "memory": "30Gi",
    "openshift.io/ens785_rn": "3"
  },
  "requests": {
    "cpu": "18",
    "hugepages-1Gi": "40Gi",
    "memory": "30Gi",
    "openshift.io/ens785_rn": "3"
  }
}

And on the worker node taskset affinity is assigned as expected

taskset -pc 656722      
pid 656722's current affinity list: 3-7,22-25,67-71,86-89

The strange thing is that previous graph was created running the same pod(s) on the same environment, just changing kepler's version in the meantime.

I will try another round of test to provide (if possible) further evidence

iconeb avatar Sep 04 '24 21:09 iconeb

I have tested Kepler on RHEL that started with isolated CPUs. The isolated CPUs were assigned to a VM. Kepler can capture the VM and report metrics. We have added this configuration in our CI.

rootfs avatar Sep 12 '24 18:09 rootfs

@rszmigiel I am closing this as the new version of kepler (0.10.0 and above should contain a fix for this. If this isn't the case, could you please open a new bug ?

sthaha avatar Aug 11 '25 04:08 sthaha