kepler
kepler copied to clipboard
Make `pod_total_*` metrics persistent/survive a Kepler restart
A clear and concise description of what the problem is.
The pod_total_*
metrics represent the total accumulated over all sample since Kepler started monitoring the particular pod. When restarting the Kepler pods this value goes back to zero.
Describe the solution you'd like
Metrics for pod_total_*
should persist/survive a reboot of the Kepler Pod.
Describe alternatives you've considered
From Prometheus perspective I can do something like sum(pod_curr_energy_in_core_millijoule{pod_name='my-pod'})
to get an approximate total as seen by Prometheus. Kepler samples every 3 secs and prometheus has its own scrape schedule. For this reason, not all the samples might be scraped by prometheus so the total seen by the sum query can be different from the actual total.
I would think this requires at three changes:
- A CLI flag to load and persist metrics
- Functions that allow Kepler to load the last metrics upon start and persist the stats before exit
- A persistent volume for Kepler
maybe need a way to clean up those metrics as well ? as the metrics might be really big ..
also ,seems it's either true/false means we should collect all
pods metrics which might be inefficient
so design might think about how to just focus on individual pod though it can be long term thing
This is a good topic!
Prometheus and Grafana have the logic to enable persistent storage.... So we should also make this configurable...
The challenge here is how do we save data to persistent storage? Maybe we need a database?
@marceloamaral a plain json on a persistent volume?
Right, so we could keep it on memory and dump to file each X minutes....
so something like following: mount a volume (eph or nfs etc) at start of kepler to a desired place every 3 sec update the pod_total_x then write to the file (should be not that heavy operation?) load the file during startup if exist and init the pod_total_x with the file?
it might have sense to fetch the last "total" metric from of a suitable node (if present) directly from Prometheus when starting kepler's pod, but as mentioned - the numbers would get high very fast and honestly is there even a reason to export "total" metrics?
In fact, we might not need the persistent volume for that!
Prometheus does a very good job of handling cases where a counter has been reset: https://prometheus.io/docs/prometheus/latest/querying/functions/#rate
So if Kepler restarts (which should be very rare) it shouldn't be a big problem from Prometheus' point of view. Unless there is a problem with the cluster and Kepler restarts too often...But it will affect everything...
honestly is there a reason to export "total" metrics?
There are some mentions in the prometheus official documentation guideline for this.
Counters are useful for accumulating the number of events or the amount of something in each event.
Gauges are useful for snapshots of state such as requests in progress, free/total memory, or temperature.
The Prometheus guideline also says:
For base unit Power:
Prefer to export a joule counter, so rate(joules[5m]) gives you power in Watts.
We could export power with a gauge, but exporting energy with counter has some advantages:
- a counter will not miss any events, as prometheus failed to scrape a metric. It will accumulate in the next interval
- using the meter will force more approximations in the calculation of energy consumption. For example, if we export power, we need to divide the energy by the kepler collection interval. So it will be another average...
- CPU utilization is measured with a counter, e.g.,
container_cpu_usage_seconds_total