graphite-beacon
graphite-beacon copied to clipboard
Add support for interval cron expressions
I am using graphite-beacon to monitor a production service that sees varying load patterns throughout the day and by day of week. I've added the ability to use cron syntax to schedule checks, rather than a fixed interval. This allows me to have alerts with certain thresholds for weekdays and work hours, with different thresholds for off-peak hours.
Bump for @klen @GarrettHeel, any interest?
Thanks for the PR @jbrody1. This looks useful - I'm just waiting to see how some other changes fall out before looking at this more seriously.
Would the ability to write more expressive rules also fit your need? (E.g "warning: < 200MB AND it is Wednesday")
More expressive rules would be great. In my case, I need rules that are enforced not only on certain days (weekdays), but certain times of day (working hours), excluding specific days (holidays), etc. Thinking about the best way to express these conditions, it could be either a new, custom syntax for chronological expressions, or cron itself. Given this choice, I went with cron.
I'm interested to see what else is coming for graphite-beacon. In the meantime I'll resolve the conflicts in case you want this PR.
Bump. Any interest? I've been running this in production for several months now and it has been working well.
Bump @garrettheel. We're still using this in production. I'd love to get back on the main codeline.
Changes Unknown when pulling 32e3c755844d3219dce05a4a24c7088a6c641a9b on jbrody1:develop into * on klen:develop*.
Changes Unknown when pulling f36744cecd244ebaf0522f9532d0f61458e0790a on jbrody1:develop into * on klen:develop*.
Thanks for the comments @garrettheel. I've implemented the changes you requested. As you can see, there was an issue with implementing historical values for cron expressions. The possible solutions were:
- Continue using only time-based historical windows everywhere (minimizes syntax changes, but adds complexity for cron intervals).
- Support either time-based or size-based historical windows (adds syntax changes and complexity).
- Support time-based or size-based historical windows for fixed intervals, but only size-based historical windows from cron intervals (adds syntax changes, but minimizes complexity).
I went with #1 to minimize the surface area of the change on syntax/API. However, this has the side effect of potentially using fewer data points if there are large gaps in the cron schedule. I think #3 might be functionally better, though it puts more burden on the user.
Please take a look, and let me know if you have a preference.
Thanks, John
Ideally I'd like to see (2) but (3) would be fine for now also. Can you leave all of the history changes out of this PR and address them in a separate one? Let's stick to size-based and figure out a way to add time-based in a non-breaking way later.
I'm happy with you to error with "Unsupported" if you don't want to implement history for cron intervals right now (just be sure to document it).
Otherwise this is really great and pretty much ready to merge!