flux2 Updating with Guage to update dashboard correctly

Signed-off-by: Santosh Kaluskar [email protected]

This PR could close #3086

Updating current metric of type go_memstats_alloc_bytes_total with go_memstats_heap_sys_bytes.

Sep 12 '22 16:09 Santosh1176

@Santosh1176 can you post a print screen with the two graphs, how it looked before and after? Thanks

Sep 13 '22 07:09 stefanprodan

@stefanprodan Apologies for not testing it before raising a PR. I shall keep that in mind for the future. I referred the Prometheus docs, which recomended not using rate on Guage metrics. But, this wasn't working when I tried it locally:

error-monitor1

When I applied rate to the metric. Its showing as below:

with-rate-monitor

Sep 13 '22 16:09 Santosh1176

this wasn't working when I tried it locally

It doesn’t work because the query is wrong, remove the ()[1m]

Sep 15 '22 09:09 stefanprodan

Thank you @stefanprodan The updated graph:

time

Sep 16 '22 16:09 Santosh1176

Looks good now! Thank you, can you please squash the commits into a single one, rebase with upstream and force push.

Sep 16 '22 16:09 stefanprodan

@Santosh1176 i have tested your changes and while using go_memstats_heap_sys_bytes does make the metrics much more accurate, i don't think its the right one to go for. container_memory_working_set_bytes is the query that we should use as its the standard query used in other dashboards such as Pod Stats & Info and its also what the scheduler tracks before it OOMKills the container (ref: https://faun.pub/how-much-is-too-much-the-linux-oomkiller-and-used-memory-d32186f29c9d).

Sep 19 '22 09:09 aryan9600

container_memory_working_set_bytes is the query that we should use

@aryan9600 @stefanprodan Thanks for this valuable information, so for this has been a good learning experience. I tried applying the said metric to the dashboard, results are below:

container

I had a doubt, as in other metric types I could not select pods to show their memory usage stats. Am I doing something wrong here? Or its the behavior specific to this metric?

Sep 19 '22 17:09 Santosh1176

I had a doubt, as in other metric types I could not select pods to show their memory usage stats. Am I doing something wrong here?

Since we are using sum() and the filter passed to the query selects all pods in the namespace with "controller" in its name, you're seeing the memory being consumed by all these pods. You need to use a group by like:

sum(container_memory_working_set_bytes{namespace="$namespace",container!="POD",container!="",pod=~".*-controller-.*"}) by (pod)

Sep 20 '22 07:09 aryan9600

@Santosh1176 please rebase instead of merging, The 0540722 commit has nothing to do with this PR.

Sep 21 '22 14:09 stefanprodan

0540722 commit has nothing to do with this PR.

Sorry! my bad, I accidentally added the unrelated commit while squashing with rebase, apologies. Can I now use the drop option while rebasing, or that would further complicate the mess. Thanks

Sep 21 '22 15:09 Santosh1176

Please squash the commits into a single one and it's ready to go. Thanks!

Sep 21 '22 15:09 stefanprodan

Please squash the commits into a single one and it's ready to go. Thanks!

Done. Request review @stefanprodan @aryan9600

Sep 22 '22 15:09 Santosh1176

@Santosh1176 you need to force push the squashed commit as well

Sep 22 '22 15:09 aryan9600

Thank you @stefanprodan @aryan9600

Sep 29 '22 13:09 Santosh1176

flux2 flux2 copied to clipboard

Updating with Guage to update dashboard correctly

flux2
flux2 copied to clipboard