redpanda
redpanda copied to clipboard
cluster: refactor disk space monitoring init
Cover letter
This was a long-dormant spinoff from adding storage_resources in Redpanda 22.2.
It addresses some technical debt:
- local_monitor was initialized as part of controller, which was not correct: it has nothing to do with controller and should run early in startup, since it only needs to see local system state. local_monitor is now initialized as a top level object in
application. - there was a hack in application.cc that constructed a temporary local_monitor because it was needed before controller startup: this is now removed.
- local_monitor had knowledge of
storage_resourcesin order to tip it off to changes in disk space: this is replaced with a generic notification hook in storage::node_api. local_monitor now just knows how to update storage::node_api. - local_monitor's tick frequency was coupled to the health monitor tick frequency, which was unnecessarily infrequent for checking disk space: this is now more frequent and happens every 1s.
Release notes
Improvements
- Disk space checks are now more frequent, running every 1 second. This makes
redpanda_storage_free_space_alertupdate more promptly when the system starts to run low on disk space.
@andrwng this may be of interest to you when looking at local storage trimming on disk pressure.
It cleans up the structure a bit so that we have local_monitor running in the background generating updates, and storage::node_api as the central point where anyone who is interested in disk space can subscribe for updates.
some debug unit tests were failing in CI on abandoned failed futures from trying to stat non existent directories: presumably these are tests that destroy their dirs in weird places, but I suppose it could also happen in reality on a system where someone is doing crazy things. We might as well tolerate it and log a warning.