csm
csm copied to clipboard
[FEATURE]: Trigger alarms when volume access latency exceeds norms
Describe the solution you'd like I'm submitting this on behalf of Itzik as it came up in a meeting this morning. He described the need for have some kind of running averages on the latency for I/O operations to complete on the array, perhaps broken down by node, or storage pool, or for a particular volume, or overall on the array. The metrics would keep a history of past metrics (perhaps an Exponential Moving Average that weights recent usage more highly than distant past) and if the latency for the item exceeded that norm (the moving average) by some percentage trigger an alarm event of some kind (perhaps a grafana alarm).
Describe alternatives you've considered Similar facilities are generally already available in CloudIQ and the various array User Interfaces. However they do not report to the kubernetes admins. Additionally this might allow some kind of kubernetes automation to be build around the alarm.
Additional context This is an enhancement, not to be considered a bug. We can discuss priority and possible implementations.
@rbo54: Thank you for submitting this issue!
The issue is currently awaiting triage. Please make sure you have given us as much context as possible.
If the maintainers determine this is a relevant issue, they will remove the needs-triage label and assign an appropriate priority label.
We want your feedback! If you have any questions or suggestions regarding our contributing process/workflow, please reach out to us at [email protected].
/sync
@rbo54, Thank you for submitting this feature request. We are keeping this feature in our backlog as this would be really nice to have in place.
I've added the "help wanted" label to solicit help from the community on building out this feature.