monitoring
monitoring copied to clipboard
Question RE: Failure to remediate
During a failure condition, what is Watchdog's stance on failure to remedy. Lets say I have an Incident attached to a service. It trips the threshold, and for some reason the code cannot fix the underlying condition (Service fails to start, etc..). Currently WatchDog just keeps trying forever. Is this the intended use case, or are there any plans for a maximum attempt / give up / bail on this monitor.
I'm testing some DSL extensions to 'unmonitor' a service. Examples here. https://github.com/phobos182/watchdog-examples/blob/master/watchdog.py