arcade icon indicating copy to clipboard operation
arcade copied to clipboard

Production - [Alerting] Android devices disconnected

Open dotnet-eng-status[bot] opened this issue 2 years ago • 26 comments

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-055} 100

Go to rule

@dotnet/dnceng, please investigate

Automation information below, do not change

Grafana-Automated-Alert-Id-35f560112f7a4bfabf9fd69bc1bd76fa

dotnet-eng-status[bot] avatar Sep 12 '22 13:09 dotnet-eng-status[bot]

IcM ticket for the machine already exists -> https://portal.microsofticm.com/imp/v3/incidents/details/334311616/home

Also, disabled the machine

oleksandr-didyk avatar Sep 12 '22 14:09 oleksandr-didyk

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-055} 100

Go to rule

dotnet-eng-status[bot] avatar Sep 13 '22 02:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-055} 100
  • FailureRate {Machine=DNCENGWIN-116} 100

Go to rule

dotnet-eng-status[bot] avatar Sep 13 '22 14:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-055} 100
  • FailureRate {Machine=DNCENGWIN-078} 94
  • FailureRate {Machine=DNCENGWIN-116} 93

Go to rule

dotnet-eng-status[bot] avatar Sep 14 '22 02:09 dotnet-eng-status[bot]

Disabled 116 and 078 + created an IcM ticket for them -> https://portal.microsofticm.com/imp/v3/incidents/details/334893393/home

Enabled 055 back as the IcM ticket for it was successfully handled

oleksandr-didyk avatar Sep 14 '22 13:09 oleksandr-didyk

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-055} 100
  • FailureRate {Machine=DNCENGWIN-078} 100
  • FailureRate {Machine=DNCENGWIN-116} 85

Go to rule

dotnet-eng-status[bot] avatar Sep 14 '22 14:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-022} 85
  • FailureRate {Machine=DNCENGWIN-078} 94
  • FailureRate {Machine=DNCENGWIN-116} 85

Go to rule

dotnet-eng-status[bot] avatar Sep 15 '22 02:09 dotnet-eng-status[bot]

Disabled 022, added it to already opened IcM

oleksandr-didyk avatar Sep 15 '22 09:09 oleksandr-didyk

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-022} 92
  • FailureRate {Machine=DNCENGWIN-066} 90
  • FailureRate {Machine=DNCENGWIN-078} 100
  • FailureRate {Machine=DNCENGWIN-116} 85

Go to rule

dotnet-eng-status[bot] avatar Sep 15 '22 14:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-022} 92
  • FailureRate {Machine=DNCENGWIN-066} 94
  • FailureRate {Machine=DNCENGWIN-078} 94
  • FailureRate {Machine=DNCENGWIN-116} 85

Go to rule

dotnet-eng-status[bot] avatar Sep 16 '22 03:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-022} 92
  • FailureRate {Machine=DNCENGWIN-078} 100
  • FailureRate {Machine=DNCENGWIN-116} 85

Go to rule

dotnet-eng-status[bot] avatar Sep 16 '22 15:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-022} 92
  • FailureRate {Machine=DNCENGWIN-116} 83

Go to rule

dotnet-eng-status[bot] avatar Sep 17 '22 03:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-022} 92

Go to rule

dotnet-eng-status[bot] avatar Sep 17 '22 15:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-022} 92

Go to rule

dotnet-eng-status[bot] avatar Sep 18 '22 03:09 dotnet-eng-status[bot]

:green_heart: Metric state changed to ok

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

Go to rule

dotnet-eng-status[bot] avatar Sep 18 '22 07:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-075} 100

Go to rule

dotnet-eng-status[bot] avatar Sep 20 '22 08:09 dotnet-eng-status[bot]

:green_heart: Metric state changed to ok

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

Go to rule

dotnet-eng-status[bot] avatar Sep 20 '22 18:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-058} 100
  • FailureRate {Machine=DNCENGWIN-120} 100

Go to rule

dotnet-eng-status[bot] avatar Sep 21 '22 08:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-058} 92
  • FailureRate {Machine=DNCENGWIN-120} 96

Go to rule

dotnet-eng-status[bot] avatar Sep 21 '22 20:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-006} 100
  • FailureRate {Machine=DNCENGWIN-051} 88
  • FailureRate {Machine=DNCENGWIN-120} 93

Go to rule

dotnet-eng-status[bot] avatar Sep 22 '22 08:09 dotnet-eng-status[bot]

After having a look in Kusto, DNCENGWIN-006 doesn't appear to be broken, it just failed the most recent few work items. Same story for DNCENGWIN-051. DNCENGWIN-120 appears to be broken, will offline it and create an ICM

dkurepa avatar Sep 22 '22 09:09 dkurepa

icm: https://portal.microsofticm.com/imp/v3/incidents/details/336719202/home

dkurepa avatar Sep 22 '22 09:09 dkurepa

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-006} 100
  • FailureRate {Machine=DNCENGWIN-051} 90
  • FailureRate {Machine=DNCENGWIN-058} 83
  • FailureRate {Machine=DNCENGWIN-120} 94

Go to rule

dotnet-eng-status[bot] avatar Sep 22 '22 21:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-006} 83
  • FailureRate {Machine=DNCENGWIN-120} 94

Go to rule

dotnet-eng-status[bot] avatar Sep 23 '22 09:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-120} 94

Go to rule

dotnet-eng-status[bot] avatar Sep 23 '22 21:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-006} 100
  • FailureRate {Machine=DNCENGWIN-120} 94

Go to rule

dotnet-eng-status[bot] avatar Sep 24 '22 09:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-006} 100
  • FailureRate {Machine=DNCENGWIN-120} 94

Go to rule

dotnet-eng-status[bot] avatar Sep 24 '22 21:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-006} 100

Go to rule

dotnet-eng-status[bot] avatar Sep 25 '22 09:09 dotnet-eng-status[bot]

:green_heart: Metric state changed to ok

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

Go to rule

dotnet-eng-status[bot] avatar Sep 25 '22 11:09 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=DNCENGWIN-006} 100

Go to rule

dotnet-eng-status[bot] avatar Sep 25 '22 11:09 dotnet-eng-status[bot]