scaphandre icon indicating copy to clipboard operation
scaphandre copied to clipboard

Test scaphandre on ARM and document the success or failure and insights

Open bpetit opened this issue 3 years ago • 24 comments

Problem

Scaphandre relies on RAPL only for now, so it works on intel x86 and may work on AMD x86 (more use cases needed). ARM use cases are growing in number and it would be great having first insights on the possibility to run scaphandre on such cpus.

Solution

Look for alternatives to rapl, on ARM. Try something in a new branch, document, discuss.

Alternatives

I don't know.

Additional context

Maybe try on raspberry pi ? (I have some @ home but I don't think I will be able to test that before february)

bpetit avatar Dec 23 '20 17:12 bpetit

I just got an Apple M1 so maybe I could test it on that machine in the following days.

lucabeetz avatar Dec 25 '20 03:12 lucabeetz

Hi ! Thanks a lot, that would be great. However you may have a double difficulty as we didn't start any development for macos. It would be interesting to dig on both arm and macos scenarii but just to mention.

bpetit avatar Dec 25 '20 06:12 bpetit

@bpetit I have a Raspberry Pi 4. I want to help, I can test this repo in a week on RaspPi, but I’m really a novice in contributing to open source, having a hard time reaching out to people in different repositories I liked. I did to most the same to what I’m doing here, sending a message in the open.

Do I have to do something here or click or say or post somewhere to say that I will tackle this issue? So that others don’t have to waste time or something.

MikeTheSapien avatar Dec 27 '20 04:12 MikeTheSapien

Hi ! This is great, thanks for your motivation ! :grinning: your message is enough to say that you'll work on it. It may be a tough one, as nothing has been done on researched about arm or its capacities or not to measure power consumption. It could also be a dead end (we could fallback on an estimation-like feature then). I think it would be great that every people working on arm posts updates ln this thread and synchronize with each other. The gitter chat room may be useful for that too. I don't think its an issue to have multiple people working on one topic that is such an exploratory topic that may lead to multiple tasks, as long as there is a communcation effort along the way.

bpetit avatar Dec 27 '20 07:12 bpetit

About macos and m1: don't know how much but this should be good news: https://github.com/rust-lang/rustup/issues/2413

bpetit avatar Dec 29 '20 17:12 bpetit

I've looked around, and as far as I can tell, there is no common tool for measuring power usage using only software for the Pi at least. Apple do have tools for the rest of their machines, but I don't have access to check.

Raspberry Pis

I have a raspberry Pi, that I'm running locally, and pretty much every guide mentions using hardware to track it.

This academic paper from from 2016 on Research gate outlined the ways they used - they used an external power tracker: https://www.researchgate.net/publication/309917878_Power_consumption_of_the_Raspberry_Pi_A_comparative_analysis

This question on the RPi stack exchange shows some sample numbers for different scenarios: https://raspberrypi.stackexchange.com/questions/5033/how-much-energy-does-the-raspberry-pi-consume-in-a-day

This RapiTV site has some more detail here: https://raspi.tv/2019/how-much-power-does-the-pi4b-use-power-measurements https://raspi.tv/2018/how-much-power-does-raspberry-pi-3b-use-power-measurements

Other Apple ARM -based machines

On Mac OS X you can call powermetrics you need to be root, and probably have xcode installed already. It spits out loaaaads of data, and is extremely configurable. It reads from RAPL as well, so presumably you could implement a Mac based sensor for this too, to run scaphandre on macs.

Their docs are pretty extensive with loads of links to tools, but in general they tend not to talk about measuring power usage in terms of watts, and instead focus on techniques you might use to reduce energy usage instead:

https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/MonitoringEnergyUsage.html#//apple_ref/doc/uid/TP40013929-CH24-SW1

Looking for stats from powermetrics on a M1 based mac.

the OS X powermetrics commands spits out data for from the following sampler groups: tasks,battery,network,disk,int_sources,devices,interrupts,cpu_power,gpu_power,gpu_agpm_stats,smc,thermal,sfi,nvme_ssd,io_throttle_ssd

You can focus on a single source with a call like below (this will leave out GPU power, power from HDDs, fans and so on):

sudo powermetrics  --samplers cpu_power

This will spit out a more manageable mount of data, that could be read, and and reformatted into a form scaphandre could manipulate. there is a also a way to format the output as a plist, which I think is a kind of XML - maybe serde would be useful for converting it?

Machine model: MacBookPro14,2
SMC version: 2.44f5
EFI version: 428.0.0
OS version: 19H2
Boot arguments:
Boot time: Tue Dec 29 19:07:04 2020



*** Sampled system activity (Wed Dec 30 14:53:11 2020 +0100) (5034.40ms elapsed) ***


**** Processor usage ****

Intel energy model derived package power (CPUs+GT+SA): 6.01W

LLC flushed residency: 56%

System Average frequency as fraction of nominal: 99.59% (3485.62 Mhz)
Package 0 C-state residency: 57.93% (C2: 17.12% C3: 40.81% C6: 0.00% C7: 0.00% C8: 0.00% C9: 0.00% C10: 0.00% )

Performance Limited Due to:
CPU LIMIT ICCMAX/PL4/OTHER
CPU LIMIT MAX_TURBO_LIMIT
CPU LIMIT TURBO_ATTENUATION
CPU/GPU Overlap: 0.61%
Cores Active: 36.36%
GPU Active: 0.82%
Avg Num of Cores Active: 0.55

Core 0 C-state residency: 67.18% (C3: 0.05% C6: 0.00% C7: 67.13% )

CPU 0 duty cycles/s: active/idle [< 16 us: 460.43/285.83] [< 32 us: 88.59/102.89] [< 64 us: 289.41/301.53] [< 128 us: 172.02/162.68] [< 256 us: 140.24/105.08] [< 512 us: 139.84/72.90] [< 1024 us: 86.60/95.15] [< 2048 us: 35.75/169.24] [< 4096 us: 7.15/127.92] [< 8192 us: 2.18/0.00] [< 16384 us: 0.60/0.00] [< 32768 us: 0.40/0.00]
CPU Average frequency as fraction of nominal: 99.74% (3490.82 Mhz)

CPU 1 duty cycles/s: active/idle [< 16 us: 1421.62/603.05] [< 32 us: 199.23/284.64] [< 64 us: 225.25/289.41] [< 128 us: 129.51/224.65] [< 256 us: 56.61/198.83] [< 512 us: 30.79/122.76] [< 1024 us: 11.72/67.34] [< 2048 us: 3.77/103.69] [< 4096 us: 0.60/148.18] [< 8192 us: 0.00/34.96] [< 16384 us: 0.00/1.59] [< 32768 us: 0.00/0.00]
CPU Average frequency as fraction of nominal: 98.55% (3449.32 Mhz)

Core 1 C-state residency: 68.04% (C3: 0.06% C6: 0.00% C7: 67.98% )

CPU 2 duty cycles/s: active/idle [< 16 us: 370.25/303.51] [< 32 us: 98.72/68.73] [< 64 us: 161.49/236.18] [< 128 us: 152.95/144.61] [< 256 us: 166.65/106.67] [< 512 us: 223.07/62.57] [< 1024 us: 91.17/73.30] [< 2048 us: 28.40/161.09] [< 4096 us: 5.16/142.62] [< 8192 us: 0.40/0.00] [< 16384 us: 0.60/0.00] [< 32768 us: 0.20/0.00]
CPU Average frequency as fraction of nominal: 99.95% (3498.17 Mhz)

CPU 3 duty cycles/s: active/idle [< 16 us: 2213.37/881.93] [< 32 us: 155.33/415.74] [< 64 us: 152.55/372.24] [< 128 us: 86.60/268.35] [< 256 us: 41.12/219.09] [< 512 us: 20.06/108.65] [< 1024 us: 9.34/80.65] [< 2048 us: 3.18/128.12] [< 4096 us: 0.60/202.61] [< 8192 us: 0.00/4.77] [< 16384 us: 0.00/0.00] [< 32768 us: 0.00/0.00]
CPU Average frequency as fraction of nominal: 98.32% (3441.21 Mhz)

mrchrisadams avatar Dec 30 '20 13:12 mrchrisadams

This may be useful: https://developer.arm.com/solutions/internet-of-things/languages-and-libraries/rust

bpetit avatar Jan 12 '21 10:01 bpetit

Hey, sorry for being inactive in the past weeks. University has started again but I'll try to continue testing this as soon as possible.

lucabeetz avatar Jan 16 '21 12:01 lucabeetz

The summary from my research is that it is not possible to get measurements from generic ARM chips e.g. Raspberry Pi purely via software:

However, macOS does expose power data through powermetrics as @mrchrisadams mentioned above. This is different from the "Energy Impact" measurement available from Activity Monitor and top (see this useful analysis). The manpage for powermetrics explains:

The cpu_power sampler reports data derived from the Intel energy models; as of the Sandy Bridge intel microarchitecture, the Intel power control unit internally maintains an energy consumption model whose details are proprietary, but are likely based on duty cycles for individual execution units, current voltage/frequency etc. These numbers are not strictly accurate but are correlated with actual energy consumption. This section lists: power dissipated by the processor package which includes the CPU cores, the integrated GPU and the system agent (integrated memory controller, last level cache), and sepa- rately, CPU core power and GT (integrated GPU) power (the latter two in a forthcoming version). The energy model data is generally not comparable across machine models.

but this is all Intel specific - there's no mention of the Apple Silicon (ARM) M1. Has Apple implemented some custom ARM instructions to expose power consumption data unlike any other ARM chip, or are they using their own internal energy models which are representative but not accurate? Unfortunately it seems that powermetrics is not open source, so we can't see how it's collecting the data.

Running on my M1 MacBook Air, the output includes power metrics:

# david @ dm-mba-m1 in ~/ o [11:38:32]
$ sudo powermetrics  --samplers cpu_power
Machine model: MacBookAir10,1
OS version: 20G165
Boot arguments:
Boot time: Sun Oct  3 15:14:36 2021



*** Sampled system activity (Sat Oct  9 11:38:50 2021 +0100) (5004.06ms elapsed) ***


**** Processor usage ****

E-Cluster Power: 36 mW
E-Cluster HW active frequency: 993 MHz
E-Cluster HW active residency:  16.70% (600 MHz: .19% 972 MHz:  96% 1332 MHz: 1.6% 1704 MHz: .85% 2064 MHz: .90%)
E-Cluster idle residency:  83.30%
cpu 0 frequency: 1020 MHz
cpu 0 idle residency:  93.02%
cpu 0 active residency:   6.98% (600 MHz: .02% 972 MHz: 6.4% 1332 MHz: .29% 1704 MHz: .13% 2064 MHz: .13%)
cpu 1 frequency: 1020 MHz
cpu 1 idle residency:  93.04%
cpu 1 active residency:   6.96% (600 MHz: .01% 972 MHz: 6.3% 1332 MHz: .46% 1704 MHz: .08% 2064 MHz: .10%)
cpu 2 frequency: 1035 MHz
cpu 2 idle residency:  95.47%
cpu 2 active residency:   4.53% (600 MHz: .01% 972 MHz: 4.0% 1332 MHz: .26% 1704 MHz: .11% 2064 MHz: .11%)
cpu 3 frequency: 1018 MHz
cpu 3 idle residency:  94.67%
cpu 3 active residency:   5.33% (600 MHz: .01% 972 MHz: 5.0% 1332 MHz: .17% 1704 MHz: .06% 2064 MHz: .13%)

P-Cluster Power: 6 mW
P-Cluster HW active frequency: 630 MHz
P-Cluster HW active residency:   0.39% (600 MHz:  97% 828 MHz: .57% 1056 MHz: .38% 1284 MHz: .44% 1500 MHz: .11% 1728 MHz: .29% 1956 MHz: .12% 2184 MHz: .20% 2388 MHz: .27% 2592 MHz: .17% 2772 MHz: .08% 2988 MHz: .04% 3096 MHz: .05% 3144 MHz: .04% 3204 MHz: .05%)
P-Cluster idle residency:  99.61%
cpu 4 frequency: 1501 MHz
cpu 4 idle residency:  99.65%
cpu 4 active residency:   0.35% (600 MHz: .03% 828 MHz: .14% 1056 MHz: .07% 1284 MHz: .00% 1500 MHz: .00% 1728 MHz: .00% 1956 MHz: .00% 2184 MHz: .00% 2388 MHz: .00% 2592 MHz: .03% 2772 MHz:   0% 2988 MHz: .00% 3096 MHz:   0% 3144 MHz:   0% 3204 MHz: .07%)
cpu 5 frequency: 2806 MHz
cpu 5 idle residency:  99.92%
cpu 5 active residency:   0.08% (600 MHz: .01% 828 MHz: .00% 1056 MHz:   0% 1284 MHz:   0% 1500 MHz:   0% 1728 MHz:   0% 1956 MHz:   0% 2184 MHz: .00% 2388 MHz: .00% 2592 MHz: .02% 2772 MHz:   0% 2988 MHz: .00% 3096 MHz:   0% 3144 MHz:   0% 3204 MHz: .05%)
cpu 6 frequency: 769 MHz
cpu 6 idle residency: 100.00%
cpu 6 active residency:   0.00% (600 MHz: .00% 828 MHz:   0% 1056 MHz:   0% 1284 MHz:   0% 1500 MHz:   0% 1728 MHz:   0% 1956 MHz:   0% 2184 MHz:   0% 2388 MHz:   0% 2592 MHz: .00% 2772 MHz:   0% 2988 MHz:   0% 3096 MHz:   0% 3144 MHz:   0% 3204 MHz:   0%)
cpu 7 frequency: 600 MHz
cpu 7 idle residency: 100.00%
cpu 7 active residency:   0.00% (600 MHz: .00% 828 MHz:   0% 1056 MHz:   0% 1284 MHz:   0% 1500 MHz:   0% 1728 MHz:   0% 1956 MHz:   0% 2184 MHz:   0% 2388 MHz:   0% 2592 MHz:   0% 2772 MHz:   0% 2988 MHz:   0% 3096 MHz:   0% 3144 MHz:   0% 3204 MHz:   0%)

ANE Power: 0 mW
DRAM Power: 118 mW
CPU Power: 42 mW
GPU Power: 4 mW
Package Power: 46 mW

The E cluster is the 4-core high efficiency cluster. The P cluster is the 4-core performance cluster.

Some work has been done at https://github.com/singhkays/apple-m1-power-consumption-powermetrics/blob/main/powermetrics-parse.py (from the raw terminal) and https://chromium.googlesource.com/android_tools/+/9a70d48fcdd68cd0e7e968f342bd767ee6323bd1/sdk/platform-tools/systrace/catapult/telemetry/telemetry/internal/platform/power_monitor/powermetrics_power_monitor.py (from a plist) to parse the output of powermetrics which may be useful.

davidmytton avatar Oct 09 '21 11:10 davidmytton

Thanks a lot @davidmytton for those ressources. I'll dig into those when I'll focus on arm.

I was searching data on arm processors for another project and found this : https://developer.arm.com/documentation/ddi0388/i/functional-description/power-management/cortex-a9-processor-power-control?lang=en

Maybe not applicable to all models, but maybe worth considering it.

bpetit avatar Feb 02 '22 11:02 bpetit

Hi, we build regression models for all Raspberry Pi devices ARM processors we have (RPi zero to 4) with a very small margin of error. We'll be publishing the models and the process to generate them (we propose an automated benchmark architecture) as soon as we are awaiting reviews on our journal paper (as the work is part of the work of my PhD student). The models will be integrated into our own power monitoring tool PowerJoular but, of course, can be implemented into Scaphandre quite easily too.

adelnoureddine avatar Feb 14 '22 13:02 adelnoureddine

Hi, we build regression models for all Raspberry Pi devices ARM processors we have (RPi zero to 4) with a very small margin of error. We'll be publishing the models and the process to generate them (we propose an automated benchmark architecture) as soon as we are awaiting reviews on our journal paper (as the work is part of the work of my PhD student). The models will be integrated into our own power monitoring tool PowerJoular) but, of course, can be implemented into Scaphandre quite easily too.

If you want/need free access to server class Arm hardware, this website has a few options available to opensource projects and education institutions.

https://developer.arm.com/solutions/infrastructure/works-on-arm

hulksmaaash avatar Feb 14 '22 18:02 hulksmaaash

If you want/need free access to server class Arm hardware, this website has a few options available to opensource projects and education institutions.

https://developer.arm.com/solutions/infrastructure/works-on-arm

Specifically on the OCI A1 instance which is free to use. https://www.oracle.com/cloud/compute/arm/ Hetzner will be offering EU based bare metal instances soon too.

vikingforties avatar Jun 09 '22 19:06 vikingforties

Hetzner will be offering EU based bare metal instances soon too.

Oh really? this has me v interested - where can i read more about this?

mrchrisadams avatar Jun 09 '22 21:06 mrchrisadams

Press release is here: https://www.hetzner.com/presse-berichte/2022/03/167137 The release day's not been given out yet, I hear they're just having to wait for networking kit. Equinix are doing Bare Metal Ampere Altra at the moment... https://metal.equinix.com/product/servers/c3-large-arm64/ For reading, there's a bunch of docs linked from https://amperecomputing.com/processors/ampere-altra/ like design specs. Hope that helps.

vikingforties avatar Jun 10 '22 16:06 vikingforties

Progress on an Arm/Ampere/Neoverse Sensor; The HWmon can report micro joule level detail per core - https://hwmon.wiki.kernel.org/device_support_status Ampere makes this available through one of the HWmon drivers in the Ampere LTS kernel - https://github.com/AmpereComputing/ampere-lts-kernel/wiki/hwmon-drivers - it uses altra_hwmon driver for this. This is exposed to Linux through lm-sensors, and queries the older apm_xgene driver which only aggregates power at the SoC level. Typically (kernel ~5.4) only has the xgene driver built in. I am yet to check whether altra_hwmon has made it to later kernels (~5.15). In line with Platypus side channel attack mitigation, fine grained altra_hwmon output is only available to root. Once altra_hwmon is in the kernel, data is written to a shared memory region, the telemetry buffer, which is readable by the host operating system. The telemetry data buffer is structured using a flexible header to identify exposed metrics and sources. I think the joules per core allow the calculation of power for the duration of a jiffy. Is there anything else needed? Ampere is becoming widespread to - Azure, Oracle Cloud, Equinix, Tencent....

vikingforties avatar Jun 13 '22 12:06 vikingforties

Thanks a lot for those insights @vikingforties !

I'll try to jump on this and see how this could interract with Scaphandre !

bpetit avatar Sep 24 '22 11:09 bpetit

Our research paper is still in the process of publication but I've already integrated the RPi CPU power models in PowerJoular. You can have a look in our repository for the complete list of models we built, but we would like, if used, to be cited (the repo/tool for now, but most importantly the paper when it's published). The benchmarks and tools to generate models for other RPi CPUs will be published with the paper.

adelnoureddine avatar Sep 24 '22 12:09 adelnoureddine

Thanks a lot @adelnoureddine. Of course we would cite you, the powerjoular community, tool and your paper if we use those models. What would you think about merging efforts around the tools to generate profiles, models and store those models, with the Energizta project ? https://github.com/boavizta/energizta

Cc @da-ekchajzer

bpetit avatar Sep 25 '22 06:09 bpetit

To specify Energizta will be a tool / Collaborative database to aggregate electricity consumption measurement :

  • At different load
  • With different stress test strategy
  • On different components (CPU, RAM, SERVER)
  • With different electricity consumption measurement strategy (RAPL, Watt Meter, ...)

The database could be used in several research projects. Our first use will be to infer consumption profiles (models) for different architectures. We have already begun with Intel CPU based on RAPL measures. @adelnoureddine since you have worked on the subject a lot, do not hesitate to tell us the important data to collect that we would have forgotten.

Example with Intel Xeon Platinium

image

Scaphande application

The consumption profiles could be used to model the electrical consumption depending on the component load for platforms without RAPL support.

da-ekchajzer avatar Sep 25 '22 09:09 da-ekchajzer

Thanks @da-ekchajzer. I think Energizta can benefit from our benchmarking architecture as we built one quite effective and we have a proof of concept implementation. The paper is about the architecture, with a validation through the RPi models. I'll wait for the paper, or the thesis defense, before we can publish these material online. But for sure, I think you can use elements from them.

adelnoureddine avatar Sep 26 '22 07:09 adelnoureddine

Quick update, our paper is published now (we'll post preprints soon). And the benchmark tool we built and used to generate models for Raspberry Pis is also published on our gitlab: https://gitlab.com/joular/cpupowerbench. The crowd-sourced server platform is currently being planned and built, but the client benchmark is published and can be used to generate new power models.

The full models (polynomial and linear) are in a JSON file in our PowerJoular repo. I've recently used our benchmark tool to generate a power model for an Asus Tinker Board.

adelnoureddine avatar Jan 10 '23 10:01 adelnoureddine

I am interested in following this issue. My aim is to asses if scaphandre can be used to monitor a mix of clusters between K8s clusters running on Intel architecture and K3s clusters running on ARM architecture RaspbPi.

BGOtura avatar Mar 23 '23 16:03 BGOtura

Noting here the recent research work from @adelnoureddine and resulting implementation for Raspberry pi models in power joular :

  • https://github.com/joular/powerjoular/blob/develop/src/raspberry_pi_cpu_formula.adb
  • https://github.com/joular/powerjoular/blob/develop/src/raspberry_pi_cpu_formula.ads
  • https://github.com/joular/powerjoular/pull/33/files

bpetit avatar Nov 09 '23 20:11 bpetit