scaphandre icon indicating copy to clipboard operation
scaphandre copied to clipboard

Support for alternative ways to collect the power usage data

Open arthurzenika opened this issue 2 years ago • 3 comments

Problem

It seems that scaphandre is focused on one way of collecting the power consumption right now through the Powercap_rapl kernel capabilities.

Could there be a way to collect though other commands and maybe have some approximations of which process is consuming power through custom code ?

In our use case, we have a bunch of "old" poweredge dell servers, and I know that we can extract some wattage indicators through some omreport (proprietary?) command (see https://serverfault.com/questions/736068/how-do-i-get-the-power-consumption-of-a-dell-poweredge-server-on-the-cli). We used to graph these using a munin plugin, and then at some point we also had a proof-of-concept grafana dashboard working with metrics collected using that same plugin but that would send the metric to graphite using saltstack as an orchestrator. I'm sure we could have a prometheus exporter that does something similar ?

Would this be in the scope of the project ? I think we could even extend the collecting to network devices that might have some power indicators in SNMP for example.

Solution

see above

arthurzenika avatar Nov 25 '21 10:11 arthurzenika

Hi @arthurzenika ,

Thanks for starting this thread. Here is an overview of what has been started or imagined so far, apart from Powercap+RAPL as a source :

  • defining and using a model based on components usage to estimate the power instead of getting it directly from RAPL, #25 is the thread to follow
  • collecting RAPL metrics directly from the MSRs (instead of asking to powercap) : works on Windows for now, feature to come is #74 - this is probably not what you are looking for as old machines (<2012) may not provide RAPL support

What we could imagine :

  • new sensor : querying omreport if it is available/installed on the machine (this kind of project makes me think this is possible)
  • new sensor : querying SNMP, but this needs some thinking as scaphandre is first intended to give metrics about the machine where it runs

Does one of those items ring a bell ? What do you think ?

bpetit avatar Jul 20 '22 12:07 bpetit

@bpetit thanks for taking a look at this. Yes these approaches seem to correspond to what I initially imagined. Indeed getting stats from other tools might have a "degraded" approach or just "you can see the global consumption but not the details by process/container/etc".

arthurzenika avatar Jul 20 '22 13:07 arthurzenika

linking this issue with #289 and #24 as well.

Latest developments on hardware inventory may help for further work on power estimation models for other components than cpu/ram, especially hard drives.

bpetit avatar May 18 '23 16:05 bpetit