meta-balena icon indicating copy to clipboard operation
meta-balena copied to clipboard

Improve healthchecks

Open ZubairLK opened this issue 6 years ago • 4 comments

balenaOS has quite a few health-checks on various systemd services. On slow devices like the pi0, these healthchecks can eat valuable cpu cycles. It would be wiser to make these health checks configurable.

Found while looking into https://github.com/balena-os/meta-balena/issues/1396

ZubairLK avatar Feb 19 '19 14:02 ZubairLK

Here is a graph of cpu usage using telegraf/influxdb/grafana

image

balenaOS cpu usage is spiky. Unless I'm mistaken that is due to the various healthchecks (supervisor/balenad being the most cpu intensive ones probably)

ZubairLK avatar Feb 20 '19 15:02 ZubairLK

We can investigate lighter-weight health-checks or perhaps make the healthcheck frequency user configurable

ZubairLK avatar Feb 22 '19 12:02 ZubairLK

Related to https://github.com/balena-os/meta-balena/issues/2423

klutchell avatar Jan 19 '22 15:01 klutchell

We don't want to make the healthchecks user configurable. If there is an issue with the healthchecks we should fix those.

We know that the current engine healthcheck also causes wear to storage media, so we would like to replace that with something more like a status check.

However, we still need some larger solution to check system overall health (like device-diagnostics) but on the device and capable of automatic recovery steps.

An old spec that is similar can be found here: https://github.com/balena-io/balena-io/pull/2009

klutchell avatar Jan 19 '22 15:01 klutchell