Staging - [Alerting] Queue Insights Failures alert
:broken_heart: Metric state changed to alerting
Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.
Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit
@dotnet/dnceng, please investigate
Automation information below, do not change
Grafana-Automated-Alert-Id-992309c92835448d815d22588ee67d0c
:green_heart: Metric state changed to ok
Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.
Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

This seems like one off Kusto failures. I think we'll need to just make the alert less sensitive.
This seems like one off Kusto failures. I think we'll need to just make the alert less sensitive.
What is the evidence? (Just curious about Kusto errors)
:broken_heart: Metric state changed to alerting
Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.
Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit
:green_heart: Metric state changed to ok
Queue Insights has thrown an unhandled exception and failed to generate its check. This could be caused by invalid data in the Matrix of Truth, or some other component failing.
Wiki Page: https://dev.azure.com/dnceng/internal/_wiki/wikis/DNCEng%20Services%20Wiki?wikiVersion=GBwikiMaster&pagePath=/FR%20Operations/Wiki%20for%20Grafana%20Alerts/%5BAlerts%5D%20Queue%20Insights&pageId=956&_a=edit

Assigned to @melotic as this is his recently-created alert...
This seems like one off Kusto failures. I think we'll need to just make the alert less sensitive.
What is the evidence? (Just curious about Kusto errors)
See this AI query
Kusto client failed to send a request to the service: The response ended prematurely..
Please provide the following information when contacting the Kusto team @ https://aka.ms/kustosupport :
DataSource='https://engsrvprod.kusto.windows.net/v1/rest/query',
DatabaseName='engineeringdata',
ClientRequestId='KD2RunQuery;5fad2bd4-5faf-4e45-ad3b-d7f1c862fd2b',
Timestamp='2022-08-01T11:05:28.6136850Z'.
I'm not sure exactly what this error means.. It happens sporadically.
Consider if there is a reasonable path to avoid needing to log the exception in the first place.
Does the client code in Build Analysis catch or retry these events? If it doesn't, maybe it should.
PR is out to retry these Kusto exceptions: https://dev.azure.com/dnceng/internal/_git/dotnet-helix-service/pullrequest/24754
PR has been merged to staging. Closing alert.