semantic-metrics icon indicating copy to clipboard operation
semantic-metrics copied to clipboard

If gauge throws an exception, reporter should continue emitting metrics

Open lmuhlha opened this issue 4 years ago • 3 comments

lmuhlha avatar Apr 24 '20 19:04 lmuhlha

@lmuhlha Is this problem related to FastForwardReporter or FastForwardHttpReporter ? Is it possible to add the stack trace?

ao2017 avatar Aug 20 '20 17:08 ao2017

So there is no stacktrace, this needs to be tested.

It came up as a result of a discussion around this PR: https://github.com/spotify/semantic-metrics/pull/61

More context on the convo: "The use case I have is that a component of my system might be unhealthy, causing a gauge to fail and throw an exception. In that case I wanted to not emit the gauge so grafana could alert me about missing data. With this PR I could catch the exception and return null. Without this PR I guess I can return 0 (which could be confusing / misleading and produce a graph that looks like things are healthy). Or maybe I can return som obviously bad value like -1 or Integer.MIN_VALUE. But that feels hacky and clunky to alert on."

"Yeah we also usually don’t suggest alerting on null because it could just be a result of the pipeline being down and can be noisy / incorrect. And part of our discussion was if people were doing things oddly already / incorrectly and getting a 0 and suddenly got no data they would think the pipeline is broken as well."

"That’s true, but if monitoring data is missing, that’s something I want to be alerted on in this case. I guess an alternative here could be to make the FastForwardReporter tolerate exceptions thrown from gauges? Currently if a gauge throws an exception I think it breaks the reporter and causes the rest of the metrics to not get emitted."

This was the ticket to confirm that behavior ^ and then implement a fix if it's true.

lmuhlha avatar Aug 20 '20 18:08 lmuhlha

Great, thank you for the info.

ao2017 avatar Aug 20 '20 19:08 ao2017