scaphandre
scaphandre copied to clipboard
Support for alternative ways to collect the power usage data
Problem
It seems that scaphandre is focused on one way of collecting the power consumption right now through the Powercap_rapl kernel capabilities.
Could there be a way to collect though other commands and maybe have some approximations of which process is consuming power through custom code ?
In our use case, we have a bunch of "old" poweredge dell servers, and I know that we can extract some wattage indicators through some omreport
(proprietary?) command (see https://serverfault.com/questions/736068/how-do-i-get-the-power-consumption-of-a-dell-poweredge-server-on-the-cli). We used to graph these using a munin plugin, and then at some point we also had a proof-of-concept grafana dashboard working with metrics collected using that same plugin but that would send the metric to graphite using saltstack as an orchestrator. I'm sure we could have a prometheus exporter that does something similar ?
Would this be in the scope of the project ? I think we could even extend the collecting to network devices that might have some power indicators in SNMP for example.
Solution
see above
Hi @arthurzenika ,
Thanks for starting this thread. Here is an overview of what has been started or imagined so far, apart from Powercap+RAPL as a source :
- defining and using a model based on components usage to estimate the power instead of getting it directly from RAPL, #25 is the thread to follow
- collecting RAPL metrics directly from the MSRs (instead of asking to powercap) : works on Windows for now, feature to come is #74 - this is probably not what you are looking for as old machines (<2012) may not provide RAPL support
What we could imagine :
- new sensor : querying omreport if it is available/installed on the machine (this kind of project makes me think this is possible)
- new sensor : querying SNMP, but this needs some thinking as scaphandre is first intended to give metrics about the machine where it runs
Does one of those items ring a bell ? What do you think ?
@bpetit thanks for taking a look at this. Yes these approaches seem to correspond to what I initially imagined. Indeed getting stats from other tools might have a "degraded" approach or just "you can see the global consumption but not the details by process/container/etc".
linking this issue with #289 and #24 as well.
Latest developments on hardware inventory may help for further work on power estimation models for other components than cpu/ram, especially hard drives.