kepler
kepler copied to clipboard
No energy usage metrics for isolated CPU cores.
What happened?
I'm running PoC with OpenShift 4.13, Kepler 0.9.2 installed with Kepler (Community) Operator. One of the use-cases is to visualise energy consumption of DPDK enabled containers. These containers are using isolated CPU cores on a SingleNodeOpenShift installation.
I got CPUs isolated with the following PerformanceProfile:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
name: openshift-node-performance-profile
spec:
additionalKernelArgs:
- "rcupdate.rcu_normal_after_boot=0"
- "efi=runtime"
- "module_blacklist=irdma"
cpu:
isolated: "4-31,36-63,68-95,100-127"
reserved: "0-3,32-35,64-67,96-99"
hugepages:
defaultHugepagesSize: 1G
pages:
- count: 64
size: 1G
node: 0
- count: 64
size: 1G
node: 1
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/master: ""
nodeSelector:
node-role.kubernetes.io/master: ''
numa:
topologyPolicy: single-numa-node
realTimeKernel:
enabled: false
workloadHints:
realTime: false
highPowerConsumption: false
perPodPowerManagement: false
I also got workload partitioning configured (https://docs.openshift.com/container-platform/4.13/scalability_and_performance/enabling-workload-partitioning.html).
While I run a Pod that's configured to use isolated CPU cores, for an instance:
resources:
limits:
memory: "24Gi"
cpu: "50"
hugepages-1Gi: 24Gi
requests:
memory: "24Gi"
cpu: "50"
hugepages-1Gi: 24Gi
and then run a sample workload to put some load on these cores, for an instance:
stress-ng --cpu 50 --io 2 --vm 50 --vm-bytes 1G --timeout 10m --metrics-brief
I can observe that assigned CPU cores shows high usage in top output:
but Kepler's power usage diagrams don't reflect that - they're very flat:
However, if I run the same Pod but on shared (non-isolated) CPU cores by removing whole resources.requests and resources.limits sections, the Kepler graphs looks much more reasonable:
even the workload is running on small portion of non-isolated CPU cores:
Therefore I conclude that Kepler does not show proper power usage when isolated CPU cores are being used.
What did you expect to happen?
I'd like to see energy usage for isolated and non-isolated CPU cores. This is very important for all high throughput, low latency workloads.
How can we reproduce it (as minimally and precisely as possible)?
Get OpenShift 4.13 with Kepler Community Operator installed, configure node to run isolated cpu cores and workloads isolation. Run two pods, one using isolated CPU cores, one using shared CPU cores. Observe that energy usage metrics are being collected only for shared (non-isolated) CPU cores.
Anything else we need to know?
No response
Kepler image tag
Kubernetes version
$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"0c63f9da2694c080257111616c60005f32a5bf47", GitTreeState:"clean", BuildDate:"2023-10-20T23:17:10Z", GoVersion:"go1.20.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.9+636f2be", GitCommit:"e782f8ba0e57d260867ea108b671c94844780ef2", GitTreeState:"clean", BuildDate:"2023-10-20T19:28:29Z", GoVersion:"go1.19.13 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}
$ oc version
Client Version: 4.14.1
Kustomize Version: v5.0.1
Server Version: 4.13.21
Kubernetes Version: v1.26.9+636f2be
Cloud provider or bare metal
OS version
# cat /etc/os-release
NAME="Red Hat Enterprise Linux CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="413.92.202310210500-0"
VERSION_ID="4.13"
VARIANT="CoreOS"
VARIANT_ID=coreos
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 413.92.202310210500-0 (Plow)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.13/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.13"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.13"
OPENSHIFT_VERSION="4.13"
RHEL_VERSION="9.2"
OSTREE_VERSION="413.92.202310210500-0"
# uname -a
Linux XYZ1 5.14.0-284.36.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 5 08:11:31 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
</details>
### Install tools
<details>
</details>
### Kepler deployment config
<details>
For on kubernetes:
```console
$ oc get cm kepler-exporter-cm -n openshift-kepler-operator -o yaml
apiVersion: v1
data:
BIND_ADDRESS: 0.0.0.0:9103
CGROUP_METRICS: '*'
CPU_ARCH_OVERRIDE: ""
ENABLE_EBPF_CGROUPID: "true"
ENABLE_GPU: "true"
ENABLE_PROCESS_METRICS: "false"
ENABLE_QAT: "false"
EXPOSE_CGROUP_METRICS: "true"
EXPOSE_HW_COUNTER_METRICS: "true"
EXPOSE_IRQ_COUNTER_METRICS: "true"
EXPOSE_KUBELET_METRICS: "true"
KEPLER_LOG_LEVEL: "1"
KEPLER_NAMESPACE: openshift-kepler-operator
METRIC_PATH: /metrics
MODEL_CONFIG: CONTAINER_COMPONENTS_ESTIMATOR=false
REDFISH_PROBE_INTERVAL_IN_SECONDS: "60"
REDFISH_SKIP_SSL_VERIFY: "true"
kind: ConfigMap
metadata:
creationTimestamp: "2024-01-05T00:51:36Z"
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/managed-by: kepler-operator
app.kubernetes.io/part-of: kepler
sustainable-computing.io/app: kepler
name: kepler-exporter-cm
namespace: openshift-kepler-operator
ownerReferences:
- apiVersion: kepler.system.sustainable.computing.io/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Kepler
name: kepler
uid: 1d726dc5-4e3a-4e00-ad82-72c62728b414
resourceVersion: "18871948"
uid: 760f221f-3230-41d5-9bcd-3b028132bc9b
$ oc -n openshift-operators describe deployment kepler-operator-controller
Name: kepler-operator-controller
Namespace: openshift-operators
CreationTimestamp: Fri, 05 Jan 2024 01:50:23 +0100
Labels: app.kubernetes.io/component=manager
app.kubernetes.io/instance=controller-manager
app.kubernetes.io/name=deployment
app.kubernetes.io/part-of=kepler-operator
olm.deployment-spec-hash=7755955f67
olm.owner=kepler-operator.v0.9.2
olm.owner.kind=ClusterServiceVersion
olm.owner.namespace=openshift-operators
operators.coreos.com/kepler-operator.openshift-operators=
Annotations: deployment.kubernetes.io/revision: 1
Selector: app.kubernetes.io/component=manager,app.kubernetes.io/instance=controller-manager,app.kubernetes.io/part-of=kepler-operator
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app.kubernetes.io/component=manager
app.kubernetes.io/instance=controller-manager
app.kubernetes.io/part-of=kepler-operator
Annotations: alm-examples:
[
{
"apiVersion": "kepler.system.sustainable.computing.io/v1alpha1",
"kind": "Kepler",
"metadata": {
"labels": {
"app.kubernetes.io/instance": "kepler",
"app.kubernetes.io/name": "kepler",
"app.kubernetes.io/part-of": "kepler-operator"
},
"name": "kepler"
},
"spec": {
"exporter": {
"deployment": {
"port": 9103
}
}
}
}
]
capabilities: Basic Install
categories: Monitoring
containerImage: quay.io/sustainable_computing_io/kepler-operator:0.9.2
createdAt: 2023-11-01T12:15:43Z
description: Deploys and Manages Kepler on Kubernetes
kubectl.kubernetes.io/default-container: manager
olm.operatorGroup: global-operators
olm.operatorNamespace: openshift-operators
olm.targetNamespaces:
operatorframework.io/properties:
{"properties":[{"type":"olm.gvk","value":{"group":"kepler.system.sustainable.computing.io","kind":"Kepler","version":"v1alpha1"}},{"type":...
operators.operatorframework.io/builder: operator-sdk-v1.27.0
operators.operatorframework.io/project_layout: go.kubebuilder.io/v3
repository: https://github.com/sustainable-computing-io/kepler-operator
Service Account: kepler-operator-controller-manager
Containers:
manager:
Image: quay.io/sustainable_computing_io/kepler-operator:0.9.2
Port: 8080/TCP
Host Port: 0/TCP
Command:
/manager
Args:
--openshift
--leader-elect
--kepler.image=$(RELATED_IMAGE_KEPLER)
--kepler.image.libbpf=$(RELATED_IMAGE_KEPLER_LIBBPF)
--zap-log-level=5
Limits:
cpu: 500m
memory: 128Mi
Requests:
cpu: 10m
memory: 64Mi
Liveness: http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
RELATED_IMAGE_KEPLER: quay.io/sustainable_computing_io/kepler:release-0.6.1
RELATED_IMAGE_KEPLER_LIBBPF: quay.io/sustainable_computing_io/kepler:release-0.6.1-libbpf
OPERATOR_CONDITION_NAME: kepler-operator.v0.9.2
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: kepler-operator-controller-5d5767d64f (1/1 replicas created)
Events: <none>
</details>
### Container runtime (CRI) and version (if applicable)
<details>
</details>
### Related plugins (CNI, CSI, ...) and versions (if applicable)
<details>
</details>
AFAIK CPU isolation removes a set of CPUs from scheduling algorithm of kernel. Kepler adds a probe to kernel's sched_switch tracepoint to calculate how much cpu time/cpu cycles a process is using, and attributes power usage based on cpu time/cpu cycles of the process.
So if some process is using a CPU which is outside of scheduler, then the probe may not be running for that cpu, Kepler may not be knowing the process's cpu time/cycles to assign any power usage to it, and may not be generating metrics for it.
Cc: @rootfs @marceloamaral
In such case could we obtain power usage metrics using alternative ways, even they're not as much detailed as in case of the eBPF use? For an instance, to workaround the issue mentioned in this case I used output from ipmitool sdr command. It provides summarised power usage across all CPUs and memory installed in the system - still better this than nothing ;-)
@rszmigiel would you please use the kepler 0.7.2 container image?
cc @vprashar2929 @sthaha
I've used kepler-operator-bundle:0.10.0 and it works!
Thank you!
great news! thanks for the update @rszmigiel
@rootfs i am really curious to know why it worked with libbpf but not with bcc. thats the only difference between two kepler versions.
the approach to calculate the cpu cycles is same in both.
It seems it's still happening with latest available version (left side of the graph), compared to 0.7.2 (reinstalled, on the right side)
Reopening issue to continue investigation.
I tried to reproduce this scenario. In a machine with 20 cores, i isolated 2 cores and executed stress-ng on these isolated cores. Kepler is able to get energy usage for these processes.
Screencast from 2024-08-29 19-11-31.webm
since the cores are isolated, any task started without cpu pinning will not be allocated to the isolated cores. in this case the cpu 2 and 12 will not be loaded
Cc: @iconeb PTAL
I confirm we have a performance profile with reserved and isolated cpus
# oc get performanceprofile upf-performance-profile -o json | jq -r .spec.cpu
{
"isolated": "2-31,34-63,66-95,98-127",
"reserved": "0-1,32-33,64-65,96-97"
}
They are correctly applied at worker node's boot
# oc debug node/compute-0.0x4954.openshift.one -- cat /proc/cmdline
[...] intel_iommu=on iommu=pt systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller=1 skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-31,34-63,66-95,98-127 tuned.non_isolcpus=00000003,00000003,00000003,00000003 systemd.cpu_affinity=0,1,32,64,33,65,96,97 intel_iommu=on iommu=pt isolcpus=managed_irq,2-31,34-63,66-95,98-127 nohz_full=2-31,34-63,66-95,98-127 nosoftlockup nmi_watchdog=0 mce=off rcutree.kthread_prio=11 default_hugepagesz=1G hugepagesz=1G hugepages=200 idle=poll rcu_nocb_poll tsc=perfect selinux=0 enforcing=0 noswap clock=pit audit=0 processor.max_cstate=1 intel_idle.max_cstate=0 rcupdate.rcu_normal_after_boot=0 softlockup_panic=0 console=ttyS0,115200n8 pcie_aspm=off pci=noaer firmware_class.path=/var/lib/firmware intel_pstate=disable
Pod is running with requests and limits
$ oc get pod upf1 -o json | jq .spec.containers[0].resources
{
"limits": {
"cpu": "18",
"hugepages-1Gi": "40Gi",
"memory": "30Gi",
"openshift.io/ens785_rn": "3"
},
"requests": {
"cpu": "18",
"hugepages-1Gi": "40Gi",
"memory": "30Gi",
"openshift.io/ens785_rn": "3"
}
}
And on the worker node taskset affinity is assigned as expected
taskset -pc 656722
pid 656722's current affinity list: 3-7,22-25,67-71,86-89
The strange thing is that previous graph was created running the same pod(s) on the same environment, just changing kepler's version in the meantime.
I will try another round of test to provide (if possible) further evidence
I have tested Kepler on RHEL that started with isolated CPUs. The isolated CPUs were assigned to a VM. Kepler can capture the VM and report metrics. We have added this configuration in our CI.
@rszmigiel I am closing this as the new version of kepler (0.10.0 and above should contain a fix for this. If this isn't the case, could you please open a new bug ?