arcade icon indicating copy to clipboard operation
arcade copied to clipboard

Staging - [Alerting] Queue Insights Failures alert

Open dotnet-eng-status-staging[bot] opened this issue 3 years ago • 9 comments

:broken_heart: Metric state changed to alerting

Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.

Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

Go to rule

@dotnet/dnceng, please investigate

Automation information below, do not change

Grafana-Automated-Alert-Id-763d449c7cd747a786373befe76ad19b

:green_heart: Metric state changed to ok

Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.

Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

Metric Graph

Go to rule

@melotic I just noticed that the link to the wiki for this alert links to edit the wiki page. If you have time would you please change the link to point just to the regular page?

garath avatar Aug 05 '22 03:08 garath

:broken_heart: Metric state changed to alerting

Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.

Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

Go to rule

:green_heart: Metric state changed to ok

Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.

Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

Metric Graph

Go to rule

:broken_heart: Metric state changed to alerting

Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.

Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

Go to rule

:green_heart: Metric state changed to ok

Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.

Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

Go to rule

@AlitzelMendez it seems that the ConfigureAwait was not the issue :(

melotic avatar Aug 08 '22 16:08 melotic

:broken_heart: Metric state changed to alerting

Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.

Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

Go to rule

:green_heart: Metric state changed to ok

Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.

Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

Metric Graph

Go to rule

:green_heart: Metric state changed to ok

Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.

Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

Metric Graph

Go to rule

Everything appears to be functioning normally. I am closing this alert. If we experience this problem again, we will investigate.

ilyas1974 avatar Aug 15 '22 14:08 ilyas1974

For context on what changed (and hopefully sticks this time). QueueInsights now retries when there's a sporadic Kusto exception

riarenas avatar Aug 15 '22 14:08 riarenas