kepler Update Grafana dashboards with the new container metrics

Why this PR is needed? PR #287 will update Prometheus metrics and affect the current Grafana dashboard. Where the new metrics will report energy per container and have more meaningful names. More details are written in issue #286.

Currently, it is difficult to understand all the queries in the existing Grafana dashboard. There are some constant values that are not obvious and some queries that are wrong. For example:

The sum_over_time(pod_curr_energy_in_core_millijoule{pod_namespace=\"$namespace\", pod_name=\"$pod\"}[24h])*15/3/3600000000 metric: sum_over_time sum the metric within the timeframe (the value in the square brackets) by getting a cumulative number from the gauge. The problem here is the granularity, we know the gauge is reported every 3s. So the query will not sum the aggregation across the 3s. Instead of a gauge, a counter should be used, e.g., pod_aggr_energy_in_core_millijoule, but of course it won't make sense to use sum_over_time. If we use the counter, to get the kw*h, we will need to use the increase function:

1W*s = 1J and 1J = (1/3600000)kWh = 0.000000277777777777778
(sum(increase(pod_aggr_energy_in_core_millijoule{}[1h])))*0.000000277777777777778

So, in Prometheus, metrics are based on averages and approximations. In fact, the increase function takes the average of the time period and multiplies it by the interval.

Also, in case we are using a counter, division by 3 makes no sense, as the rate function already returns values per second... and the increase just get the rate and multiply by the interval.

Additionally, I didn't understand the multiplication by 15 and the division by 3600000000...

Another example: The rate(pod_curr_energy_in_gpu_millijoule{}[1m])/3 metric. The previous metric pod_curr_energy_in_gpu_millijoule was a gauge, and rate over a gauge metric doesn't make sense... Again, it would make sense to use the counter pod_aggr_energy_in_core_millijoule, but not divide by 3....

What this PR does? This PR updates the Grafana dashboard with the new metrics and the properly queries.

For the query that will return watt, we will have:

sum without (command, container_name)(
    rate(kepler_container_package_joules_total{}[5s])
)

And another query will return kWh per day: Note that, to calculate the kwh we need to multiply the kilowatts by the hours of daily use, therefore we will count the how many hours within a day the container is running.

sum by (pod_name, container_name) (
  (increase(kepler_container_package_joules_total{}[1h]) * $watt_per_second_to_kWh)
  *
  (count_over_time(kepler_container_package_joules_total{}[24h]) /
    count_over_time(kepler_container_package_joules_total{}[1h])
  )
)

I have also fixed other minor issues in the dashboard, such as

have the All value in the namespace and pod variables
make the Coal, Natural Gas and Petroleum Coefficient transparent and editable

Additional comments

Signed-off-by: Marcelo Amaral [email protected]

Oct 11 '22 11:10 marceloamaral

@marceloamaral the unit looks quite big, 233kWh in kube-system is quite a lots.

Oct 11 '22 12:10 rootfs

@sustainable-computing-io/kepler-deployment

Oct 11 '22 12:10 rootfs

Consider the figure of Total Power Consumption in Watts By taking the PKG only, we will have average 109 watts (which is a per second unit) In 1h, we have 109*3600 = 392400 => 392 kwh

Reading RAPL directly (without using Kepler), we also have 109 Watts for Packages in all sockets:

sudo ./rapl-read

RAPL read -- use -s for sysfs, -p for perf_event, -m for msr

Found Skylake-X Processor type
	0 (0), 1 (0), 2 (0), 3 (0), 4 (0), 5 (0), 6 (0), 7 (0)
	8 (0), 9 (0), 10 (0), 11 (0), 12 (0), 13 (0), 14 (0), 15 (0)
	16 (0), 17 (0), 18 (0), 19 (0), 20 (1), 21 (1), 22 (1), 23 (1)
	24 (1), 25 (1), 26 (1), 27 (1), 28 (1), 29 (1), 30 (1), 31 (1)
	32 (1), 33 (1), 34 (1), 35 (1), 36 (1), 37 (1), 38 (1), 39 (1)
	40 (2), 41 (2), 42 (2), 43 (2), 44 (2), 45 (2), 46 (2), 47 (2)
	48 (2), 49 (2), 50 (2), 51 (2), 52 (2), 53 (2), 54 (2), 55 (2)
	56 (2), 57 (2), 58 (2), 59 (2), 60 (3), 61 (3), 62 (3), 63 (3)
	64 (3), 65 (3), 66 (3), 67 (3), 68 (3), 69 (3), 70 (3), 71 (3)
	72 (3), 73 (3), 74 (3), 75 (3), 76 (3), 77 (3), 78 (3), 79 (3)

	Detected 80 cores in 4 packages


Trying sysfs powercap interface to gather results

	Sleeping 1 second

	Package 0
		package-0	: 33.977086J
		dram	: 5.870488J
	Package 1
		package-1	: 23.706666J
		dram	: 5.831993J
	Package 2
		package-2	: 23.227846J
		dram	: 5.540650J
	Package 3
		package-3	: 28.262440J
		dram	: 5.049948J

Oct 11 '22 12:10 marceloamaral

@marceloamaral that's great to see that You're around to get this solved!

for sure we need to generalize the dashboard before merging, else it won't be able to import it in general cases. I left a comment about this - from previous experience, changing the "datasource" dictionary (with type + uid) into straight "datasource": "prometheus" should be enough to get this generalized towards using a consistently named "prometheus" datasource.

also as a sidenote: there's not a consistent standard of Grafana versions everyone here is using - I've seen 7.x, 9.x and now 8.x here 9 (through 'pluginVersion' lines). Not sure whether this might cause any issues in the future, as Grafana is not entirely my main field of expertise.

Oct 12 '22 08:10 Feelas

Also it would be good to get understanding where the mentioned *15/3600000000 came from - @rootfs, do you know who could explain the logic behind these? 3600000000 seems fairly obvious (boils down to 3600 * 1000 * 1000), but the *15 multiplication is hard to understand just as @marceloamaral says. IMO this needs to be explained so we at least understand whether we're not missing any necessary logic which results from some behaviour that needed to be regulated by the *15 multiplication.

@marceloamaral I feel stupid now, but isn't the "wattage" statistic usually delivered on a per-hour basis, not per-second? 392kWh in one hour feels pretty impossible.

In 1h, we have 109*3600 = 392400 => 392 kwh

Actually, with 109W (aka 109J/s) doesn't it follow that 109J/s * 3600s [ (J*s)/s is either J or W*s] = 392400J [or W*s] = 392,4kJ [or kW*s]? I don't see where the conversion to kWh is done. If using a commonly brought up conversion factor 1J = 0.0000002778kWh (0.0000002778 = 1/3600000 - to fix units W*s -> kW*h requires division), we arrive at 392400*0.0000002778kWh, which is 0,10900872kWh, as expected of definition of kWh, or did I make a major mistake here?

Oct 12 '22 14:10 Feelas

@Feelas you are absolute right! I was missing a step!

We have 392400W*s = 392400J and 1J = (1/3600000)kWh So, it must be ~0,109kWh

To make this conversion more transparent, I created a constant for it

I updated the dashboard and the PR description accordingly

Oct 13 '22 05:10 marceloamaral

@Feelas I have also created the variable datasource, so it might help with the issue

Oct 13 '22 06:10 marceloamaral

@Feelas I have also created the variable datasource, so it might help with the issue

I can see that grafana operator has some way to populate the variable automatically, so we'll try to make use of it. If this documentation is up-to-date, we'll also need to include the dashboard variable in the __inputs section for it to be possible to set it up inside a YAML resource. It looks going that way would make the dashboard itself more flexible - easy to import manually, as the datasources should then automatically be available when importing and possible to parametrize in manifests.

@sallyom are you familiar with using .spec.datasources from Grafana dashboard CRDs to set a datasource? This would probably remediate #262 and similar bugs in the future.

Oct 13 '22 07:10 Feelas

@rootfs could you please review this PR again? I updated the dashboard figure. There was a mistake in the calculation of kW*h but everything should be fine now

Oct 13 '22 09:10 marceloamaral

@Feelas

I fixed one more thing:

To calculate the kwh we need to multiply the kilowatts by the hours of daily use, therefore we will count the how many hours within a day the container is running. Some containers might have been running for less than 24h...

sum by (pod_name, container_name) (
  (increase(kepler_container_package_joules_total{}[1h]) * $watt_per_second_to_kWh)
  *
  (count_over_time(kepler_container_package_joules_total{}[24h]) /
    count_over_time(kepler_container_package_joules_total{}[1h])
  )
)

The second block returns the number of hours the container was running by taking all samples within 24h and dividing by all samples within 1h.

Oct 14 '22 04:10 marceloamaral

Some containers might have been running for less than 24h...

Yeah, but wouldn't a container running for 10 minutes automatically use up less kWh (since we're using totals)? I cannot think of an example nor a counter example, so please explain more what exactly "not doing this" breaks for us.

Oct 14 '22 08:10 Feelas

@Feelas

Hmm, let me try to explain the logic and then we can check if it makes sense for our use case....

There are different ways to report kWh:

per day (1 kW in a period of 30 minutes = 1 * 0.5 = 0.5 kWh)
per month (0.5 kWh per day over a period of 30 days = 0.5 * 30 = 15 kWh)

The idea is to report the kWh per day.

The increase(kepler_container_package_joules_total{}[1h]) * $watt_per_second_to_kWh) function returns the kWh for 1 hour. Which means that 1 kW over a period of 1h = 1 * 1 = 1 kWh....

But we want to calculate per day, so we need to multiply by how many hours the container is running... and this is the second function...

One would think that we could use the range of [24h] instead of [1h] in the previous function.... However, the increase function will multiply the average value of the 24h interval by 24. The increase function calculates an approximation... But some containers may not be running for 24 hours, so we need to calculate how many hours the container is running

The second function count_over_time(kepler_container_package_joules_total{}[24h]) / count_over_time(kepler_container_package_joules_total{}[1h]) returns how many hours the container was running by counting the number of samples in 24h divided by the number of samples in 1h. This is necessary because the Prometheus scrape interval can vary. For example, the default interval is 30s, but Kepler configures it to use 3s...

container running for 10 minutes

So maybe we should calculate hours at the granularity of minutes to get a fraction:

(count_over_time(kepler_container_package_joules_total{}[1440m]) /
    count_over_time(kepler_container_package_joules_total{}[1m])) / 60

Oct 15 '22 10:10 marceloamaral

However, the increase function will multiply the average value of the 24h interval by 24. The increase function calculates an approximation...

I get it, so this is a necessary part to implement because of how the increase function works?

So maybe we should calculate hours at the granularity of minutes to get a fraction:

Sorry if that was a confusing example on that front; I'm not necessarily stressing that the time precision is not high enough, just provided an example of 1/6 hour runtime "just using less kWh".

Oct 17 '22 08:10 Feelas

@Feelas in the case you agree with the metrics, could you please lgtm this PR?

Nov 07 '22 05:11 marceloamaral

Hi, Just tested this PR in our just installed Kepler and it works fine except we had to change irate(metric)[1m] to irate(metric)[2m] for all queries in two panels :

Pod/Process Power Consumption (W) in Namespace: $namespace
Total Power Consumption (W) in Namespace: $namespace

Nov 16 '22 16:11 loicmathieu

let's merge it to work with the new metrics.

Nov 17 '22 22:11 rootfs

@marceloamaral I feel the query is likely to return wrong values for consumption. Your inputs on https://github.com/sustainable-computing-io/kepler/discussions/946 will be highly appreciated 🙇

Sep 28 '23 02:09 sthaha

kepler kepler copied to clipboard

Update Grafana dashboards with the new container metrics

kepler
kepler copied to clipboard