rabbitmq-server
rabbitmq-server copied to clipboard
How should Streams handle disk alarms?
It's not clear what the Streams should do when a disk alarm is triggered. Current behavior is just like with queues - stop the publishers, and hope for the resources to free up. However:
- Currently, Stream retention is only triggered when the active segment file reaches the max segment file size and a new segment file is created or when the retention configuration is changed (ie. the
x-max-length-bytesvalues is updated through a policy). When the publishers are blocked, the segment size won't grow and the retention process won't be triggered. The only option to clear the alarm is a manual intervention. - If
x-max-ageis set, data in the stream could expire but:
x-max-ageis not always set- even when it is set, it could take a long time for anything to expire (like weeks for example)
- as mentioned above, currently, we wouldn't trigger retention evaluation anyway, since publishers are blocked so we won't reach the max segment size
A system where the total max-length-bytes for all streams is higher than the available disk space is a production incident waiting to happen. We can warn about this in the Management UI and perhaps some CLI commands, metrics, etc. Basic disk monitoring should also flag this situation before it actually happens (see our disk alert example)
We can consider preventing such configurations but that would make the getting started experience harder (eg. stream-perf-test sets the max-length to 20GB by default - in a playground environment, this could already exceed the available disk space).
Feel free to chime in with what you would expect to happen.
Currently there do not seem to be a way to determine how much disk space is being consumed by a stream (at least the management ui does not show it). It would be very nice to be able to see that so you are not left to guessing what is causing your free disk to slowly being depleted.
Update: starting with 3.11.4, max-age retention is evaluated every hour.