kepler icon indicating copy to clipboard operation
kepler copied to clipboard

Missing support for linux/arm64/v8

Open beneiltis opened this issue 1 year ago • 20 comments

What happened?

First of all: Great work. The operator is working great for me 👍. Thx a lot for your great work.

When i deploy the operator to my kubernetes cluster (k3s) i receive the error:

Failed to pull image "quay.io/sustainable_computing_io/kepler:release-0.6.1": no matching manifest for linux/arm64/v8 in the manifest list entries

As far as I can see, there is only a amd64 image in the container registry. Do you pepole have any plans on multiarch support?

What did you expect to happen?

A working image pull for arm64 systems. In this case: a M2 Mac.

How can we reproduce it (as minimally and precisely as possible)?

Run the helm chart on a m1 or m2 mac in docker-desktop-kubernetes.

Anything else we need to know?

No response

Kepler image tag

quay.io/sustainable_computing_io/kepler:release-0.6.1

Kubernetes version

v1.28.2

Cloud provider or bare metal

OS version

# On Linux:
$ cat /etc/os-release
macos 13.4.1 
$ uname -a
Darwin MBP 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun  8 22:22:20 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6000 arm64

Install tools

Kepler deployment config

For on kubernetes:

$ KEPLER_NAMESPACE=kepler

# provide kepler configmap
$ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE} 
# paste output here

# provide kepler deployment description
$ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE} 

For standalone:

put your Kepler command argument here

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

beneiltis avatar Jan 03 '24 11:01 beneiltis

@beneiltis thanks for testing kepler on ARM. We are still working on multi arch image build at the moment. The current ARM platform kepler support is Ampere, since Ampere CPU has a hwmon that reports power consumption. If we know how to get power readings from e.g. apple silicon, we would love to support it too.

cc @vimalk78

rootfs avatar Jan 03 '24 14:01 rootfs

Good to know @rootfs. I am no specialist in this field but I can waste my night by looking into it. maybe I find something to contribute.

beneiltis avatar Jan 03 '24 15:01 beneiltis

@rootfs To enable arm64 with latest kepler version, we need to enhance cpuid install approach in our build process as this package just have x86 version which fails with arm64 image build. here are some suggestions:

  1. remove cpuid install from build image.
  2. remove file copy for cpuid from build image to kepler image.
  3. install cpuid during image build for x86 only. Optional, for build performance considering... I am not sure we need all features from elfutils or does there any way we can install elfutils form rpm packages? for both x86, arm64 and s390?

SamYuan1990 avatar Jan 05 '24 14:01 SamYuan1990

there seems to be no release of cpuid for arm

sh-5.1# yum install -y cpuid 
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered with an entitlement server. You can use subscription-manager to register.

Extra Packages for Enterprise Linux 9 - aarch64                                                                                                                                                                557 kB/s |  20 MB     00:36    
Extra Packages for Enterprise Linux 9 openh264 (From Cisco) - aarch64                                                                                                                                          572  B/s | 2.5 kB     00:04    
No match for argument: cpuid
Error: Unable to find a match: cpuid

vimalk78 avatar Jan 08 '24 09:01 vimalk78

there seems to be no release of cpuid for arm

sh-5.1# yum install -y cpuid 
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered with an entitlement server. You can use subscription-manager to register.

Extra Packages for Enterprise Linux 9 - aarch64                                                                                                                                                                557 kB/s |  20 MB     00:36    
Extra Packages for Enterprise Linux 9 openh264 (From Cisco) - aarch64                                                                                                                                          572  B/s | 2.5 kB     00:04    
No match for argument: cpuid
Error: Unable to find a match: cpuid

https://github.com/sustainable-computing-io/kepler/pull/1169 try to support multiple arch for base image.

SamYuan1990 avatar Jan 08 '24 10:01 SamYuan1990

Great. Thx @SamYuan1990 :-) What do we now have to do in order to make in run on apple silicon? i only found powermetrics --show-process-energy which is a good starting point for me.

beneiltis avatar Jan 15 '24 19:01 beneiltis

Ok I checked the repo out and due to the changes I can now build the Dockerfile.builder and Dockerfile and run it on my mac with the correct architecture. Awesome.

When I replace the daemonset image with my self-build-image I get following errors:

I0116 07:44:19.822600 1 gpu.go:46] Failed to init nvml, err: failed to init nvml. ERROR_LIBRARY_NOT_FOUND E0116 07:44:19.824146 1 utils.go:140] getCPUArch failure: open /sys/devices/cpu/caps/pmu_name: no such file or directory I0116 07:44:19.826799 1 qat.go:35] Failed to init qat-telemtry err: could not get qat status exit status 127 I0116 07:44:19.839222 1 exporter.go:155] Kepler running on version: 1.20.10 I0116 07:44:19.839256 1 config.go:275] using gCgroup ID in the BPF program: true I0116 07:44:19.839288 1 config.go:277] kernel version: 6.5 I0116 07:44:19.839340 1 exporter.go:167] LibbpfBuilt: true, BccBuilt: false I0116 07:44:19.839343 1 exporter.go:186] EnabledBPFBatchDelete: true I0116 07:44:19.839382 1 rapl_msr_util.go:129] failed to open path /dev/cpu/0/msr: no such file or directory I0116 07:44:19.839442 1 power.go:72] Unable to obtain power, use estimate method I0116 07:44:19.839462 1 redfish.go:169] failed to get redfish credential file path I0116 07:44:19.839485 1 acpi.go:67] Could not find any ACPI power meter path. Is it a VM? I0116 07:44:19.839513 1 power.go:72] using none to obtain power I0116 07:44:19.839524 1 exporter.go:201] Initializing the GPU collector I0116 07:44:25.841021 1 watcher.go:66] Using in cluster k8s config libbpf: map 'cpu_instructions': found type = 2. libbpf: map 'cpu_instructions': found key [6], sz = 4. libbpf: map 'cpu_instructions': found value [12], sz = 8. libbpf: map 'cpu_instructions': found max_entries = 128. libbpf: map 'cache_miss_hc_reader': at sec_idx 13, offset 256. libbpf: map 'cache_miss_hc_reader': found type = 4. libbpf: map 'cache_miss_hc_reader': found key [2], sz = 4. libbpf: map 'cache_miss_hc_reader': found value [6], sz = 4. libbpf: map 'cache_miss_hc_reader': found max_entries = 128. libbpf: map 'cache_miss': at sec_idx 13, offset 288. libbpf: map 'cache_miss': found type = 2. libbpf: map 'cache_miss': found key [6], sz = 4. libbpf: map 'cache_miss': found value [12], sz = 8. libbpf: map 'cache_miss': found max_entries = 128. libbpf: map 'cpu_freq_array': at sec_idx 13, offset 320. libbpf: map 'cpu_freq_array': found type = 2. libbpf: map 'cpu_freq_array': found key [6], sz = 4. libbpf: map 'cpu_freq_array': found value [6], sz = 4. libbpf: map 'cpu_freq_array': found max_entries = 128. libbpf: map 'arm64_ke.data' (global data): at sec_idx 11, offset 0, flags 400. libbpf: map 11 is "arm64_ke.data" libbpf: map 'arm64_ke.bss' (global data): at sec_idx 12, offset 0, flags 400. libbpf: map 12 is "arm64_ke.bss" libbpf: sec '.reltracepoint/sched/sched_switch': collecting relocation for section(3) 'tracepoint/sched/sched_switch' libbpf: sec '.reltracepoint/sched/sched_switch': relo #0: insn #2 against 'sample_rate' libbpf: prog 'kepler_trace': found data map 11 (arm64_ke.data, sec 11, off 0) for insn 2 libbpf: sec '.reltracepoint/sched/sched_switch': relo #1: insn #6 against 'counter_sched_switch' libbpf: prog 'kepler_trace': found data map 12 (arm64_ke.bss, sec 12, off 0) for insn 6 libbpf: sec '.reltracepoint/sched/sched_switch': relo #2: insn #32 against 'cpu_cycles_hc_reader' libbpf: prog 'kepler_trace': found map 2 (cpu_cycles_hc_reader, sec 13, off 64) for insn #32 libbpf: sec '.reltracepoint/sched/sched_switch': relo #3: insn #51 against 'cpu_cycles' libbpf: prog 'kepler_trace': found map 3 (cpu_cycles, sec 13, off 96) for insn #51 libbpf: sec '.reltracepoint/sched/sched_switch': relo #4: insn #65 against 'cpu_cycles' libbpf: prog 'kepler_trace': found map 3 (cpu_cycles, sec 13, off 96) for insn #65 libbpf: sec '.reltracepoint/sched/sched_switch': relo #5: insn #70 against 'cpu_ref_cycles_hc_reader' libbpf: prog 'kepler_trace': found map 4 (cpu_ref_cycles_hc_reader, sec 13, off 128) for insn #70 libbpf: sec '.reltracepoint/sched/sched_switch': relo #6: insn #83 against 'cpu_ref_cycles' libbpf: prog 'kepler_trace': found map 5 (cpu_ref_cycles, sec 13, off 160) for insn #83 libbpf: sec '.reltracepoint/sched/sched_switch': relo #7: insn #97 against 'cpu_ref_cycles' libbpf: prog 'kepler_trace': found map 5 (cpu_ref_cycles, sec 13, off 160) for insn #97 libbpf: sec '.reltracepoint/sched/sched_switch': relo #8: insn #102 against 'cpu_instructions_hc_reader' libbpf: prog 'kepler_trace': found map 6 (cpu_instructions_hc_reader, sec 13, off 192) for insn #102 libbpf: sec '.reltracepoint/sched/sched_switch': relo #9: insn #117 against 'cpu_instructions' libbpf: prog 'kepler_trace': found map 7 (cpu_instructions, sec 13, off 224) for insn #117 libbpf: sec '.reltracepoint/sched/sched_switch': relo #10: insn #129 against 'cpu_instructions' libbpf: prog 'kepler_trace': found map 7 (cpu_instructions, sec 13, off 224) for insn #129 libbpf: sec '.reltracepoint/sched/sched_switch': relo #11: insn #134 against 'cache_miss_hc_reader' libbpf: prog 'kepler_trace': found map 8 (cache_miss_hc_reader, sec 13, off 256) for insn #134 libbpf: sec '.reltracepoint/sched/sched_switch': relo #12: insn #146 against 'cache_miss' libbpf: prog 'kepler_trace': found map 9 (cache_miss, sec 13, off 288) for insn #146 libbpf: sec '.reltracepoint/sched/sched_switch': relo #13: insn #160 against 'cache_miss' libbpf: prog 'kepler_trace': found map 9 (cache_miss, sec 13, off 288) for insn #160 libbpf: sec '.reltracepoint/sched/sched_switch': relo #14: insn #168 against 'cpu_freq_array' libbpf: prog 'kepler_trace': found map 10 (cpu_freq_array, sec 13, off 320) for insn #168 libbpf: sec '.reltracepoint/sched/sched_switch': relo #15: insn #182 against 'cpu_freq_array' libbpf: prog 'kepler_trace': found map 10 (cpu_freq_array, sec 13, off 320) for insn #182 libbpf: sec '.reltracepoint/sched/sched_switch': relo #16: insn #194 against 'cpu_freq_array' libbpf: prog 'kepler_trace': found map 10 (cpu_freq_array, sec 13, off 320) for insn #194 libbpf: sec '.reltracepoint/sched/sched_switch': relo #17: insn #218 against 'cpu_freq_array' libbpf: prog 'kepler_trace': found map 10 (cpu_freq_array, sec 13, off 320) for insn #218 libbpf: sec '.reltracepoint/sched/sched_switch': relo #18: insn #227 against 'pid_time' libbpf: prog 'kepler_trace': found map 1 (pid_time, sec 13, off 32) for insn #227 libbpf: sec '.reltracepoint/sched/sched_switch': relo #19: insn #235 against 'pid_time' libbpf: prog 'kepler_trace': found map 1 (pid_time, sec 13, off 32) for insn #235 libbpf: sec '.reltracepoint/sched/sched_switch': relo #20: insn #247 against 'pid_time' libbpf: prog 'kepler_trace': found map 1 (pid_time, sec 13, off 32) for insn #247 libbpf: sec '.reltracepoint/sched/sched_switch': relo #21: insn #253 against 'processes' libbpf: prog 'kepler_trace': found map 0 (processes, sec 13, off 0) for insn #253 libbpf: sec '.reltracepoint/sched/sched_switch': relo #22: insn #273 against 'processes' libbpf: prog 'kepler_trace': found map 0 (processes, sec 13, off 0) for insn #273 libbpf: sec '.reltracepoint/sched/sched_switch': relo #23: insn #300 against 'processes' libbpf: prog 'kepler_trace': found map 0 (processes, sec 13, off 0) for insn #300 libbpf: sec '.reltracepoint/irq/softirq_entry': collecting relocation for section(5) 'tracepoint/irq/softirq_entry' libbpf: sec '.reltracepoint/irq/softirq_entry': relo #0: insn #5 against 'processes' libbpf: prog 'kepler_irq_trace': found map 0 (processes, sec 13, off 0) for insn #5 libbpf: sec '.relkprobe/mark_page_accessed': collecting relocation for section(7) 'kprobe/mark_page_accessed' libbpf: sec '.relkprobe/mark_page_accessed': relo #0: insn #4 against 'processes' libbpf: prog 'kprobe__mark_page_accessed': found map 0 (processes, sec 13, off 0) for insn #4 libbpf: sec '.relkprobe/set_page_dirty': collecting relocation for section(9) 'kprobe/set_page_dirty' libbpf: sec '.relkprobe/set_page_dirty': relo #0: insn #4 against 'processes' libbpf: prog 'kprobe__set_page_dirty': found map 0 (processes, sec 13, off 0) for insn #4 libbpf: map 'processes': created successfully, fd=9 libbpf: map 'pid_time': created successfully, fd=10 libbpf: map 'cpu_cycles_hc_reader': created successfully, fd=11 libbpf: map 'cpu_cycles': created successfully, fd=12 libbpf: map 'cpu_ref_cycles_hc_reader': created successfully, fd=13 libbpf: map 'cpu_ref_cycles': created successfully, fd=14 libbpf: map 'cpu_instructions_hc_reader': created successfully, fd=15 libbpf: map 'cpu_instructions': created successfully, fd=16 libbpf: map 'cache_miss_hc_reader': created successfully, fd=17 libbpf: map 'cache_miss': created successfully, fd=18 libbpf: map 'cpu_freq_array': created successfully, fd=19 libbpf: map 'arm64_ke.data': created successfully, fd=20 libbpf: map 'arm64_ke.bss': created successfully, fd=21 libbpf: failed to open '/sys/kernel/tracing/events/sched/sched_switch/id': No such file or directory libbpf: failed to determine tracepoint 'sched/sched_switch' perf event ID: No such file or directory libbpf: prog 'kepler_trace': failed to create tracepoint 'sched/sched_switch' perf event: No such file or directory I0116 07:44:25.953230 1 bpf_perf.go:135] failed to attach bpf with libbpf: failed to attach sched/sched_switch: failed to attach tracepoint sched_switch to program kepler_trace: no such file or directory, fall back to bcc attachment I0116 07:44:25.953312 1 exporter.go:237] failed to start : failed to attach bpf assets: no bcc build tag I0116 07:44:25.953385 1 exporter.go:269] Started Kepler in 6.114231877s

As I can see from the github-workflows you are not using arm-runners. If you like we can contribute our runners for the project. We would be more than happy to help :-)

beneiltis avatar Jan 16 '24 07:01 beneiltis

@beneiltis you are more than welcome to contribute to the project :)

marceloamaral avatar Jan 16 '24 13:01 marceloamaral

there seems to be no release of cpuid for arm

Please be noted that cpuid is a tool for detecting x86 CPU features/capabilities, the author of cpuid is Tod Ellen. You could see ARM related functionalities on his website also. I believe it is more useful than current code in Kepler for ARM CPU model identification.

Furthermore, in my recent feature commit for CPUID alternative solution, since cpuid is not available for ARM platforms, we can also use the ARM CPU section in cpus.yaml to maintain the known ARM CPU model as an alternative workaround.

jiere avatar Jan 17 '24 14:01 jiere

@beneiltis thanks for the input! what platform did you run kepler and ebpf?

rootfs avatar Jan 17 '24 15:01 rootfs

Great. Thx @SamYuan1990 :-) What do we now have to do in order to make in run on apple silicon? i only found powermetrics --show-process-energy which is a good starting point for me.

Well, to be honest, as we discussed on kepler community meeting, I just made the arm64 image there with latest code base. as @jiere said, I suppose we need to further discuss. as cpuid is just for x86, hence maybe we need a build tag for that part of code to avoid it breaks arm64 or s390x(@jiangphcn in loop here for notice him) as @rootfs said and without my misunderstand, currently the arm64 version of code just support redfish... and you need to config it correctly.

SamYuan1990 avatar Jan 19 '24 13:01 SamYuan1990

@beneiltis @YaSuenag latest kepler supports multiarch (thanks to @SamYuan1990 ), can you give it a try?

image

rootfs avatar Jan 25 '24 14:01 rootfs

I am using Apple M1 Max, macOS 13.4.1.

Yes the container now starts correctly. Awesine :-)

But I dont get any energy readings. But I did not exspect that because kepler does not support Apple Silicon right?

But now if i update my legacy testing cluster (12 year old hardware) I get a nil point dereference. I post it here but I can also create a new ticket for that:

`I0125 15:16:47.276278 1 libbpf_attacher.go:188] Successfully load eBPF module from libbpf object I0125 15:16:47.276372 1 process_energy.go:114] Using the Ratio/DynPower Power Model to estimate Process Platform Power I0125 15:16:47.276383 1 process_energy.go:115] Process feature names: [cpu_instructions] I0125 15:16:47.276458 1 process_energy.go:124] Using the Ratio/DynPower Power Model to estimate Process Component Power I0125 15:16:47.276467 1 process_energy.go:125] Process feature names: [cpu_instructions cpu_instructions cache_miss gpu_sm_util] I0125 15:16:47.276484 1 process_energy.go:114] Using the Ratio/DynPower Power Model to estimate Process Platform Power I0125 15:16:47.276492 1 process_energy.go:115] Process feature names: [cpu_instructions] I0125 15:16:47.276507 1 process_energy.go:124] Using the Ratio/DynPower Power Model to estimate Process Component Power I0125 15:16:47.276533 1 process_energy.go:125] Process feature names: [cpu_instructions cpu_instructions cache_miss gpu_sm_util] I0125 15:16:47.276877 1 node_platform_energy.go:52] Using the LinearRegressor/AbsPower Power Model to estimate Node Platform Power I0125 15:16:47.277032 1 exporter.go:269] Started Kepler in 155.044557ms panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x827598]

goroutine 16 [running]: github.com/sustainable-computing-io/kepler/pkg/collector/stats/types.(*UInt64StatCollection).AddDeltaStat(0x0, {0x1998506, 0x7}, 0x0) /workspace/pkg/collector/stats/types/types.go:108 +0x38 github.com/sustainable-computing-io/kepler/pkg/collector/resourceutilization/bpf.updateSWCounters(0x178f120?, 0xc0003c26c0, 0x1ba21?) /workspace/pkg/collector/resourceutilization/bpf/process_bpf_collector.go:43 +0x117 github.com/sustainable-computing-io/kepler/pkg/collector/resourceutilization/bpf.UpdateProcessBPFMetrics(0xc0001776f0?) /workspace/pkg/collector/resourceutilization/bpf/process_bpf_collector.go:121 +0x69c github.com/sustainable-computing-io/kepler/pkg/collector.(*Collector).updateProcessResourceUtilizationMetrics(0xc000477950?, 0x0?) /workspace/pkg/collector/metric_collector.go:200 +0x54 github.com/sustainable-computing-io/kepler/pkg/collector.(*Collector).updateResourceUtilizationMetrics(0xc000477950) /workspace/pkg/collector/metric_collector.go:159 +0x56 github.com/sustainable-computing-io/kepler/pkg/collector.(*Collector).Update(0xb2d05e00?) /workspace/pkg/collector/metric_collector.go:110 +0x48 github.com/sustainable-computing-io/kepler/pkg/manager.(*CollectorManager).Start.func1() /workspace/pkg/manager/manager.go:73 +0x7b created by github.com/sustainable-computing-io/kepler/pkg/manager.(*CollectorManager).Start /workspace/pkg/manager/manager.go:65 +0x6a Stream closed EOF for mogenius/kepler-s5s6z (kepler-exporter)`

beneiltis avatar Jan 25 '24 15:01 beneiltis

@beneiltis kepler doesn't have Apple M1 energy sensor yet, it is something we haven't started.

btw, I have to disable the arm64 image build because the libbpf has an architecture dependency. Will let you know when this is fixed.

rootfs avatar Jan 25 '24 19:01 rootfs

I do not currently have the Arm server at my disposal. So I cannot evaluate the image now, sorry. (I believe I will do in few monthes...)

YaSuenag avatar Jan 25 '24 23:01 YaSuenag

fix is in #1255

rootfs avatar Feb 23 '24 23:02 rootfs

Ok arm64 is now working 👍 Now I guess we/I need to come up with something to support apple silicon (M1-M3). To be honest I guess this is some kind of edge-case (who would run a real cluster on her/his macbook) but it would be awesome to demonstrate keplers abilities on a local setup. the compute/power ratio of these systems is realy incredible.

beneiltis avatar Feb 28 '24 14:02 beneiltis

Sorry for the late reply.

I tried the latest Kepler ( quay.io/sustainable_computing_io/kepler:release-0.7.8 ) on Fedora 39 on Altra Q80-30 (HPE RL300) with Kubernetes v1.29. It looks good, but I saw some strange logs:

I0401 06:54:57.584191       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:54:57.584199       1 power.go:67] use Ampere Xgene sysfs to obtain power

  <snip>

I0401 06:54:57.597915       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:54:57.598157       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:54:57.598406       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:54:57.598671       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input

  <snip>

libbpf: prog 'kprobe__finish_task_switch': failed to create kprobe 'finish_task_switch+0x0' perf event: No such file or directory
I0401 06:54:57.730645       1 libbpf_attacher.go:128] failed to attach kprobe/finish_task_switch: failed to attach finish_task_switch k(ret)probe to program kprobe__finish_task_switch: no such file or directory. Try finish_task_switch.isra.0 -> (1)

  <snip>

I0401 06:54:57.767582       1 libbpf_attacher.go:195] Successfully load eBPF module from libbpf object
I0401 06:54:57.767626       1 process_energy.go:114] Using the Ratio/DynPower Power Model to estimate Process Platform  Power -> (2)

  <snip>

I0401 06:54:57.768211       1 exporter.go:270] Started Kepler in 184.47496ms
I0401 06:55:00.826484       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:55:00.827290       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:55:03.793142       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:55:03.793628       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:55:06.787461       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:55:06.787998       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:55:09.788545       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input
I0401 06:55:09.789155       1 apm_xgene_sysfs.go:61] Found power input file: /sys/class/hwmon/hwmon0/power1_input

  <snip>

-> (3)
  1. It looks like to fail to probe finish_task_switch. Should we fix this? On my kernel (6.7.10-200.fc39.aarch64) has finish_task_switch.isra.0 in /proc/kallsyms
  2. Is Power Model use by default? I'd like to use real measurement data only. Should I tweak something in values.yaml? I installed Kepler via Helm by default (no values.yaml).
  3. I saw a lot of log entries about apm_xgene_sysfs.go:61. It seems to occur twice by 3 seconds. Is it bug?

YaSuenag avatar Apr 01 '24 07:04 YaSuenag

@YaSuenag thanks for the update. For 1) kepler loads ebpf program and first tries to attach finish_task_switch, if failed, then attaches finish_task_switch.isra.0. The error messagefailed to attach kprobe/finish_task_switch: failed to attach finish_task_switch k(ret)probe to program kprobe__finish_task_switch: no such file or directory. Try finish_task_switch.isra.0 -> (1) is benign.

For 2). the power model is not used in this case since kepler runs on baremetal env

For 3) Yes, it is a bug. Can you create a PR and change this line the verbosity level from 1 to e.g. 5? That'll make the logs go away. Thanks

rootfs avatar Apr 01 '24 14:04 rootfs

Thanks @rootfs ! I opened PR #1322. It works fine on my environment.

YaSuenag avatar Apr 02 '24 01:04 YaSuenag