flux2 icon indicating copy to clipboard operation
flux2 copied to clipboard

Monitoring - Flux Control Plane Dashboard shows incorrect memory consumption

Open pjbgf opened this issue 1 year ago • 6 comments

During a recent investigation we have noticed that the controller's memory consumption was being incorrectly displayed in our official Dashboard, when compared with the data from Pod Stats & Info or Pixie:

Flux Control Plane > Memory

image

Pod Stats & Info

image

Pixie PX/Pod

image

We need to review that dashboard and update it accordingly.

pjbgf avatar Sep 08 '22 13:09 pjbgf

@pjbgf I would argue that we show the correct stats by looking at Go stats (go_memstats_alloc_bytes) instead of the OS.

stefanprodan avatar Sep 08 '22 13:09 stefanprodan

Unfortunately that does not account for CGO or C allocated objects (i.e. libgit2) that were not freed. Changing the current panel (left) to only contain source controller, and adding the same metrics as Pod Stats & Info's I get this for the same time range: image

If we prefer go metrics, we could use go_memstats_heap_sys_bytes instead, which seem more accurate for our use case: image

pjbgf avatar Sep 08 '22 14:09 pjbgf

If go_memstats_heap_sys_bytes captures the CGO usage, Ok then let's change it to that.

stefanprodan avatar Sep 08 '22 14:09 stefanprodan

Can I take this up? Just to reconfirm, the Metrics in rate(go_memstats_alloc_bytes_total{namespace=\"$namespace\",pod=~\".*-controller-.*\"}[1m]) should be replaced with the one mentioned by @stefanprodan to resolve this?

Santosh1176 avatar Sep 09 '22 11:09 Santosh1176

@Santosh1176 Yes, please note that the go_memstats_alloc_bytes_total is a counter while the go_memstats_heap_sys_bytes is a gauge.

stefanprodan avatar Sep 09 '22 14:09 stefanprodan

the go_memstats_heap_sys_bytes is a gauge.

Thankyou @stefanprodan Does that mean, rate() shouldn’t be used here?

Santosh1176 avatar Sep 10 '22 00:09 Santosh1176