arcade icon indicating copy to clipboard operation
arcade copied to clipboard

AzDO Networking issue impacting multiple builds

Open missymessa opened this issue 2 years ago • 14 comments

Issue for tracking the intermittent, inconsistent networking errors we're encountering in our builds.

https://portal.microsofticm.com/imp/v3/incidents/details/292951370/home

{
   "errorMessage" : "net/http: request canceled while waiting for connection"
}

Report

Build Definition Step Name Console log
17054 dotnet/roslyn Initialize containers Log

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 1

missymessa avatar Mar 10 '22 21:03 missymessa

I have created Azure support ticket 2203110010001681 to try and help address this.

ilyas1974 avatar Mar 11 '22 14:03 ilyas1974

Is there perhaps plans for a public version of the link above so we have a general idea on what the cause of the issue might be?

AraHaan avatar Mar 13 '22 10:03 AraHaan

good news it seems my pr is now unblocked by this: https://github.com/dotnet/arcade/pull/8604

AraHaan avatar Mar 14 '22 09:03 AraHaan

Hi @AraHaan - I don't think there's a way to provide public access to our internal issue tracking system. :( In this case, the title kinda says it all. We're having intermittent issues w/ connectivity and the Azure folks are trying to get to the bottom of it.

Great to see your PRs seems to have worked!

@adiaaida and/or @mmitche - would one of you mind taking a look at the PR? (looks good to me)

markwilkie avatar Mar 14 '22 15:03 markwilkie

I don't have context on the file being changed, so hopefully Matt can look?

michellemcdaniel avatar Mar 14 '22 16:03 michellemcdaniel

I am adding some new cases affected by this error.

  1. Installer Build and Test coreclr Linux_arm64 Release, Pipelines - Run 20220314.2. error:
docker run -v /mnt/vss/_work/1/s:/root/runtime -w=/root/runtime -e VSS_NUGET_URI_PREFIXES -e VSS_NUGET_ACCESSTOKEN mcr.microsoft.com/dotnet-buildtools/prereqs:rhel-7-rpmpkg-c982313-20174116044113 ./build.sh --ci --subset packs.installers /p:BuildRpmPackage=true /p:Configuration=Release /p:TargetOS=Linux /p:TargetArchitecture=arm64 /p:RuntimeFlavor=coreclr /p:RuntimeArtifactsPath=/root/runtime/artifacts/transport/coreclr /p:RuntimeConfiguration=release /p:LibrariesConfiguration=Release /bl:artifacts/log/Release/msbuild.rpm.installers.binlog 
Unable to find image 'mcr.microsoft.com/dotnet-buildtools/prereqs:rhel-7-rpmpkg-c982313-20174116044113' locally 
docker: Error response from daemon: Get "https://mcr.microsoft.com/v2/": net/http: request canceled while waiting for connection
  1. Installer Build and Test coreclr Linux_musl_x64 Release, Pipelines - Run 20220314.22, Mono Product Build Linux x64 debug Run 20220316.68 error:
docker: error pulling image configuration: Get "https://westus2.data.mcr.microsoft.com/01031d61e1024861afee5d512651eb9f-h36fskt2ei//docker/registry/v2/blobs/sha256/d3/d3358c58cff96d0874e70d9ef680e5c69a452079d7d651f9e441c48b62a95144/data?se=2022-03-14T18%3A52%3A56Z&sig=7CM6Q6E1lL%2F07ifd%2FR1VVO%2BRlBbCH%2FiCs8V%2Fki%2BvxXE%3D&sp=r&spr=https&sr=b&sv=2016-05-31&regid=01031d61e1024861afee5d512651eb9f": dial tcp 131.253.33.219:443: i/o timeout. 
  1. Build Android arm Release AllSubsets_Mono, Pipelines - Run 20220314.1, Build Browser wasm Linux Release LibraryTests_EAT, Pipelines - Run 20220315.4, Build Linux x64 Release AllSubsets_Mono_LLVMJIT Run 20220316.68. error:
Error response from daemon: Get "[https://mcr.microsoft.com/v2/"](https://mcr.microsoft.com/v2/%22): net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
##[error]Docker pull failed with exit code 1

ilonatommy avatar Mar 16 '22 07:03 ilonatommy

Are we tracking the numerous MCR download failures here or somewhere else❔ That problem seems to be getting worse instead of better e.g.

  • https://dev.azure.com/dnceng/internal/_build/results?buildId=1664939&view=results
  • https://dev.azure.com/dnceng/internal/_build/results?buildId=1665084&view=results
  • https://dev.azure.com/dnceng/internal/_build/results?buildId=1665221&view=results
  • https://dev.azure.com/dnceng/internal/_build/results?buildId=1665434&view=results

dougbu avatar Mar 16 '22 17:03 dougbu

Per information from @agocke , there is a problem with the CDN behind MCR.

ilyas1974 avatar Mar 16 '22 21:03 ilyas1974

Per information from, there is a problem with the CDN behind MCR.

Tracking in 295259702

garath avatar Mar 16 '22 22:03 garath

After doing some additional research, we could not find any recent instances of this error. Should it occur again, we will open another issue with the MCR team

ilyas1974 avatar Mar 31 '22 18:03 ilyas1974

@ilyas1974 https://github.com/dotnet/roslyn/pull/56162/checks?check_run_id=5776024591

AraHaan avatar Mar 31 '22 21:03 AraHaan

reopening as new instance of this issues are happening such as:

  • https://dev.azure.com/dnceng/public/_build/results?buildId=1752769&view=logs&j=174348cd-3455-59d8-c7f7-32c969b0807d&t=eda8acff-ca02-497c-b15e-87884972117b&l=29

AlitzelMendez avatar May 04 '22 23:05 AlitzelMendez

Do y'all think this should be a known build error, or marked as critical? My impression is that the hit count is low enough that it doesn't meet the critical bar....but not sure if we have a bar yet.... :) Thoughts @ilyas1974 ?

markwilkie avatar Aug 09 '22 15:08 markwilkie

We set a bar of 200 jobs impacted before we engage a partner team (metric review back in May).

ilyas1974 avatar Aug 10 '22 11:08 ilyas1974

As there have not been any instances of this issue for the last 7 days, I am closing this issue.

ilyas1974 avatar Nov 16 '22 15:11 ilyas1974