redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

cluster: refactor disk space monitoring init

Open jcsp opened this issue 3 years ago • 1 comments

Cover letter

This was a long-dormant spinoff from adding storage_resources in Redpanda 22.2.

It addresses some technical debt:

  • local_monitor was initialized as part of controller, which was not correct: it has nothing to do with controller and should run early in startup, since it only needs to see local system state. local_monitor is now initialized as a top level object in application.
  • there was a hack in application.cc that constructed a temporary local_monitor because it was needed before controller startup: this is now removed.
  • local_monitor had knowledge of storage_resources in order to tip it off to changes in disk space: this is replaced with a generic notification hook in storage::node_api. local_monitor now just knows how to update storage::node_api.
  • local_monitor's tick frequency was coupled to the health monitor tick frequency, which was unnecessarily infrequent for checking disk space: this is now more frequent and happens every 1s.

Release notes

Improvements

  • Disk space checks are now more frequent, running every 1 second. This makes redpanda_storage_free_space_alert update more promptly when the system starts to run low on disk space.

jcsp avatar Jun 28 '22 11:06 jcsp

@andrwng this may be of interest to you when looking at local storage trimming on disk pressure.

It cleans up the structure a bit so that we have local_monitor running in the background generating updates, and storage::node_api as the central point where anyone who is interested in disk space can subscribe for updates.

jcsp avatar Nov 16 '22 14:11 jcsp

some debug unit tests were failing in CI on abandoned failed futures from trying to stat non existent directories: presumably these are tests that destroy their dirs in weird places, but I suppose it could also happen in reality on a system where someone is doing crazy things. We might as well tolerate it and log a warning.

jcsp avatar Nov 25 '22 14:11 jcsp