synthetic-monitoring-agent icon indicating copy to clipboard operation
synthetic-monitoring-agent copied to clipboard

A timeout in the middle of an HTTP request results in strange metrics

Open mem opened this issue 2 years ago • 2 comments

If the check times out during an HTTP request (e.g. after name resolution is done but before transfer is complete) the context will be canceled and some code paths will result in negative durations.

E.g. if start is now, and transfer complete is still 0, "transfer complete" - "start" results in a negative duration.

The issue is possibly in BBE, but we are making it visible.

This is possibly present in other checks that compute phases.

mem avatar May 11 '22 20:05 mem

Thank you for raising this issue. I think it was raised due to the support query that I (or perhaps also others) raised, as it was pointed out by the Grafana Cloud support team. I'm seeing a negative duration of 292 years. I can understand asynchronous events being able to become a small amount negative, but if this is what I experienced then I would expect that there is an invalid value or conversion somewhere.

Fydon avatar May 12 '22 09:05 Fydon

Sorry I think I understand what you are saying now. The start time is initialised to the current timestamp and transfer complete is initialised to zero, so the "duration" becomes the value of the start time timestamp.

Hopefully this can either be resolved, e.g. initialising transfer complete to start time or checking if transfer complete is 0 (although its probably not that simple), or worked around, e.g. reducing negative durations to -1 (to expose the problem while not breaking graphs) or 0 (to hide the problem). I've had this problem occur on Grafana Cloud a few times.

Fydon avatar Apr 18 '24 18:04 Fydon