dnceng icon indicating copy to clipboard operation
dnceng copied to clipboard

restarted. Azure DevOps can't recover from restarts.

Open lewing opened this issue 1 year ago • 5 comments

Build

https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=784852

Build leg reported

Build / linux-x64 debug Libraries_AllConfigurations

Pull Request

https://github.com/dotnet/runtime/pull/106599

Known issue core information

Fill out the known issue JSON section by following the step by step documentation on how to create a known issue

 {
    "ErrorMessage" : "restarted. Azure DevOps can't recover from restarts.",
    "BuildRetry": false,
    "ErrorPattern": "",
    "ExcludeConsoleLog": false
 }

@dotnet/dnceng

Release Note Category

  • [ ] Feature changes/additions
  • [ ] Bug fixes
  • [ ] Internal Infrastructure Improvements

Release Note Description

Additional information about the issue reported

No response

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=784852 Error message validated: [restarted. Azure DevOps can't recover from restarts.] Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 8/26/2024 7:12:18 PM UTC

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0

lewing avatar Aug 26 '24 19:08 lewing

It seems like all reports are pointing to linux-x64 dev-innerloop leg from this definition: https://dev.azure.com/dnceng-public/public/_build?definitionId=133. GitHub doesn't sync the status and keeps showing as if the job is running for days.. Opened dotnet/runtime#108581 to disable the leg.

am11 avatar Oct 07 '24 05:10 am11

Just before the timeout we see low memory warnings like these:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=829130&view=logs&j=e80acbf0-bc87-577c-4c46-0016b0794913&t=f0fa9d72-e49a-5249-4d28-1199014b9857 image

then it hangs for ~20 minutes or so before giving up. The build command has -allConfigurations so it builds all product+test assemblies for all platforms ({linux,win,osx,freebsd,illumos}-{x86,x64,arm,arm64,riscv64.. etc.}) all in one invocation of build (which isn't exactly efficient as we should probably group them..), which means, as it stands, this leg needs decent amount of RAM.

am11 avatar Oct 07 '24 12:10 am11

@ilyas1974 @markwilkie I don't think our RAM consumption has increased that much to not be able to handle this configuration. Thoughts?

steveisok avatar Oct 07 '24 23:10 steveisok

I took a look at a few passing builds, many have the same logs where they approach 95% memory usage but eventually succeed.

Here are some samples: https://dev.azure.com/dnceng-public/public/_build/results?buildId=836601&view=logs&j=e80acbf0-bc87-577c-4c46-0016b0794913&t=f0fa9d72-e49a-5249-4d28-1199014b9857 https://dev.azure.com/dnceng-public/public/_build/results?buildId=838110&view=logs&j=e80acbf0-bc87-577c-4c46-0016b0794913&t=f0fa9d72-e49a-5249-4d28-1199014b9857&l=3923 https://dev.azure.com/dnceng-public/public/_build/results?buildId=837745&view=logs&j=e80acbf0-bc87-577c-4c46-0016b0794913&t=f0fa9d72-e49a-5249-4d28-1199014b9857&l=4228

Near this point I see logs like this:

  initializing ChangeMakerService with capabilities: Baseline, AddMethodToExistingType, AddStaticFieldToExistingType, AddInstanceFieldToExistingType, NewTypeDefinition, ChangeCustomAttributes, UpdateParameters, GenericAddMethodToExistingType, GenericUpdateMethod, GenericAddFieldToExistingType
  baseline ready
  got a change
  parsing patch #1 from /__w/1/s/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.GenericAddInstanceField/GenericAddInstanceField_v1.cs and creating delta
  Found changes in GenericAddInstanceField.cs
  change service made fa564b82-cf1c-4fb0-9d1a-f5ca4c71ff03
  wrote /__w/1/s/artifacts/bin/System.Reflection.Metadata.ApplyUpdate.Test.GenericAddInstanceField/Debug/net10.0/System.Reflection.Metadata.ApplyUpdate.Test.GenericAddInstanceField.dll.1.dmeta
  got a change
  parsing patch #2 from /__w/1/s/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.GenericAddInstanceField/GenericAddInstanceField_v2.cs and creating delta
  Found changes in GenericAddInstanceField.cs
  change service made fa564b82-cf1c-4fb0-9d1a-f5ca4c71ff03
  wrote /__w/1/s/artifacts/bin/System.Reflection.Metadata.ApplyUpdate.Test.GenericAddInstanceField/Debug/net10.0/System.Reflection.Metadata.ApplyUpdate.Test.GenericAddInstanceField.dll.2.dmeta
  done

It looks to me like this is coming from https://github.com/dotnet/hotreload-utils/blob/254ec75de6127c368827d15c3af2477095b8b1b4/src/Microsoft.DotNet.HotReload.Utils.Generator/EnC/ChangeMakerService.cs#L28

Does anyone have an idea why hotreload would be running during a build?? I could imagine that if some hot reload service was runnign during a build or if tests were running while the product was building that could explain high memory usage.

ericstj avatar Oct 10 '24 17:10 ericstj

This seems to be happening consistently in PR #109320 . Should I just try re-running a third time or is it never gonna work?

snakex64 avatar Oct 30 '24 15:10 snakex64

@lewing is this still an issue?

missymessa avatar Sep 26 '25 20:09 missymessa