operations icon indicating copy to clipboard operation
operations copied to clipboard

Fastly error rate alarms are spammy

Open pnorman opened this issue 4 months ago • 12 comments

On the 11th and 12th I got about 100 notifications about fastly error rate alarms with the OSM community CDN. I'm not sure what was going on, but it looks like for some of the time the alarm was flapping.

pnorman avatar Aug 14 '25 04:08 pnorman

I suspect there is a routing issue causing the failures. Needs investigation.

Firefishy avatar Aug 14 '25 11:08 Firefishy

The errors appear to be due to first byte timeout

Firefishy avatar Aug 14 '25 15:08 Firefishy

I have silenced the error.

Firefishy avatar Aug 14 '25 18:08 Firefishy

Are we leaving the error permanently silenced?

pnorman avatar Aug 15 '25 09:08 pnorman

I hope not! But no I assume @Firefishy set a time limited silence as I didn't notice a commit to change the alerts.

tomhughes avatar Aug 15 '25 09:08 tomhughes

3 day time limit

Firefishy avatar Aug 15 '25 09:08 Firefishy

Okay, then I'll leave this issue open as the alarm issue remains regardless of the routing issue.

pnorman avatar Aug 15 '25 09:08 pnorman

What "alarm issue" is that exactly? Fastly were having intermittent problems reaching the origin server from one or more POPs so there were intermittent alarms.

What exactly would you have liked to be different?

tomhughes avatar Aug 15 '25 09:08 tomhughes

The issue was that one event caused 100 alarms as different POPs caused different alarms to fire and resolve.

pnorman avatar Aug 15 '25 09:08 pnorman

Was it one event or was it a series of event? Whatever unless you know something I don't there isn't a way to fix that.

tomhughes avatar Aug 15 '25 09:08 tomhughes

As a trial in reducing the flappy alerts I have set keep_firing_for values for a number of different alerts: https://github.com/openstreetmap/chef/commit/26b1bdb9ddc8781526b9597ad79b0c566e4a6aaf

Firefishy avatar Aug 21 '25 21:08 Firefishy

That should help cut down on noise

pnorman avatar Aug 22 '25 09:08 pnorman