arcade icon indicating copy to clipboard operation
arcade copied to clipboard

Production - [Alerting] Apple simulator failure rate alert

Open dotnet-eng-status[bot] opened this issue 3 years ago • 7 comments

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-036} 98
  • FailureRate {Machine=dci-mac-build-171} 94

Go to rule

@dotnet/dnceng, please investigate

Automation information below, do not change

Grafana-Automated-Alert-Id-36d07fceeaf0472b804d8358b2198eac

dotnet-eng-status[bot] avatar Aug 10 '22 10:08 dotnet-eng-status[bot]

remediation instructions didn't work :

Unhandled exception: Microsoft.NET.HostModel.AppHost.AppHostSigningException: error: /Applications/Xcode_11.5.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/codesign_allocate: can't open file: /Users/dotnet-bot/temp/xharness (No such file or directory)
/Users/dotnet-bot/temp/xharness: the codesign_allocate helper tool cannot be found or used

   at Microsoft.NET.HostModel.AppHost.HostWriter.CreateAppHost(String appHostSourceFilePath, String appHostDestinationFilePath, String appBinaryFilePath, Boolean windowsGraphicalUserInterface, String assemblyToCopyResorcesFrom, Boolean enableMacOSCodeSign)
   at Microsoft.DotNet.ShellShim.AppHostShellShimMaker.CreateApphostShellShim(FilePath entryPoint, FilePath shimPath)
   at Microsoft.DotNet.ShellShim.ShellShimRepository.<>c__DisplayClass5_0.<CreateShim>b__0()

... so creating an IcM

MattGal avatar Aug 10 '22 17:08 MattGal

https://portal.microsofticm.com/imp/v3/incidents/details/326847968/home

MattGal avatar Aug 10 '22 17:08 MattGal

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-036} 98
  • FailureRate {Machine=dci-mac-build-171} 94

Go to rule

dotnet-eng-status[bot] avatar Aug 10 '22 22:08 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-036} 98
  • FailureRate {Machine=dci-mac-build-171} 94

Go to rule

dotnet-eng-status[bot] avatar Aug 11 '22 10:08 dotnet-eng-status[bot]

Oh wow, these really sound messed up

premun avatar Aug 11 '22 15:08 premun

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-036} 98
  • FailureRate {Machine=dci-mac-build-171} 94

Go to rule

dotnet-eng-status[bot] avatar Aug 11 '22 22:08 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-036} 98
  • FailureRate {Machine=dci-mac-build-171} 94

Go to rule

dotnet-eng-status[bot] avatar Aug 12 '22 10:08 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-036} 98
  • FailureRate {Machine=dci-mac-build-171} 94

Go to rule

dotnet-eng-status[bot] avatar Aug 12 '22 22:08 dotnet-eng-status[bot]

:green_heart: Metric state changed to ok

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

Go to rule

dotnet-eng-status[bot] avatar Aug 13 '22 10:08 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-030} 87

Go to rule

dotnet-eng-status[bot] avatar Aug 15 '22 10:08 dotnet-eng-status[bot]

I have tried to wipe the simulators clean on dci-mac-build-030 and put it back to rotation

premun avatar Aug 15 '22 11:08 premun

Tossing to backlog

MattGal avatar Aug 15 '22 16:08 MattGal

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

Go to rule

dotnet-eng-status[bot] avatar Aug 15 '22 22:08 dotnet-eng-status[bot]

:green_heart: Metric state changed to ok

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

Go to rule

dotnet-eng-status[bot] avatar Aug 16 '22 08:08 dotnet-eng-status[bot]

I am putting this to tracking/blocked as we are waiting on that IcM

premun avatar Aug 16 '22 08:08 premun

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-147} 83

Go to rule

dotnet-eng-status[bot] avatar Aug 18 '22 22:08 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-053} 83
  • FailureRate {Machine=dci-mac-build-147} 83

Go to rule

dotnet-eng-status[bot] avatar Aug 19 '22 10:08 dotnet-eng-status[bot]

I disabled both, will reset simulators there

premun avatar Aug 19 '22 10:08 premun

Simulators reset, machines rebooted, put back into rotation

premun avatar Aug 19 '22 11:08 premun

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-147} 83

Go to rule

dotnet-eng-status[bot] avatar Aug 19 '22 22:08 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-147} 83

Go to rule

dotnet-eng-status[bot] avatar Aug 20 '22 10:08 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-147} 83

Go to rule

dotnet-eng-status[bot] avatar Aug 20 '22 22:08 dotnet-eng-status[bot]

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

  • FailureRate {Machine=dci-mac-build-147} 83

Go to rule

dotnet-eng-status[bot] avatar Aug 21 '22 11:08 dotnet-eng-status[bot]

:green_heart: Metric state changed to ok

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

Go to rule

dotnet-eng-status[bot] avatar Aug 21 '22 19:08 dotnet-eng-status[bot]

Associated ticket was resolved, I re-enabled the original machines

premun avatar Aug 22 '22 08:08 premun