flux2
flux2 copied to clipboard
Monitoring - Flux Control Plane Dashboard shows incorrect memory consumption
During a recent investigation we have noticed that the controller's memory consumption was being incorrectly displayed in our official Dashboard, when compared with the data from Pod Stats & Info or Pixie:
Flux Control Plane > Memory
Pod Stats & Info
Pixie PX/Pod
We need to review that dashboard and update it accordingly.
@pjbgf I would argue that we show the correct stats by looking at Go stats (go_memstats_alloc_bytes) instead of the OS.
Unfortunately that does not account for CGO or C allocated objects (i.e. libgit2) that were not freed. Changing the current panel (left) to only contain source controller, and adding the same metrics as Pod Stats & Info's I get this for the same time range:
If we prefer go metrics, we could use go_memstats_heap_sys_bytes
instead, which seem more accurate for our use case:
If go_memstats_heap_sys_bytes
captures the CGO usage, Ok then let's change it to that.
Can I take this up?
Just to reconfirm, the Metrics in rate(go_memstats_alloc_bytes_total{namespace=\"$namespace\",pod=~\".*-controller-.*\"}[1m])
should be replaced with the one mentioned by @stefanprodan to resolve this?
@Santosh1176 Yes, please note that the go_memstats_alloc_bytes_total
is a counter while the go_memstats_heap_sys_bytes
is a gauge.
the go_memstats_heap_sys_bytes is a gauge.
Thankyou @stefanprodan Does that mean, rate() shouldn’t be used here?