runtime
runtime copied to clipboard
Infrastructure - Status/Health
Overview
Please use these queries to discover issues
Blocking CI
Blocking CI Optional
Blocking Outerloop
Goals
- A minimum 95% passing rate for the
runtimepipeline
Resources
/cc @dotnet/coreclr-infra
/cc @dotnet/jit-contrib
https://github.com/dotnet/coreclr/issues/26057 Failed to resolve SDK 'Microsoft.DotNet.Helix.Sdk'
dotnet/coreclr#27453 Test Infrastructure Failure: Access to the path ... is denied
Summary of the week of 21-Oct-2019
Problems (cross out signifies the problem is fixed)
- ~~OSX build machines failing to take work~~
- Failed to resolve SDK 'Microsoft.DotNet.Helix.Sdk' dotnet/coreclr#26057
- ~~Linux arm musl build fails~~
- ~~Clang 5.0 throws a stack trace occasionally when building arm or arm64 targets~~
@jashook does this issue need a new owner while you are on vacation?
I assume the ownership is now shared.
/cc @trylek @ViktorHofer @jkoritzinsky @dagood @jaredpar
Libraries Build Windows_NT x86 Release leg is failing 100% of time. See https://github.com/dotnet/runtime/pull/967
Interesting. To me it looks like a code issue rather than an infra hiccup though:
Fatal error. 0xC0000005 at DynamicClass.WriteTypeWithDateTimeOffsetTypePropertyToJson(System.Runtime.Serialization.XmlWriterDelegator, System.Object, System.Runtime.Serialization.Json.XmlObjectSerializerWriteContextComplexJson, System.Runtime.Serialization.ClassDataContract, System.Xml.XmlDictionaryString[]) at System.Runtime.Serialization.Json.JsonClassDataContract.WriteJsonValueCore(System.Runtime.Serialization.XmlWriterDelegator, System.Object, System.Runtime.Serialization.Json.XmlObjectSerializerWriteContextComplexJson, System.RuntimeTypeHandle) at System.Runtime.Serialization.Json.JsonDataContract.WriteJsonValue(System.Runtime.Serialization.XmlWriterDelegator, System.Object, System.Runtime.Serialization.Json.XmlObjectSerializerWriteContextComplexJson, System.RuntimeTypeHandle) at System.Runtime.Serialization.Json.DataContractJsonSerializerImpl.WriteJsonValue(System.Runtime.Serialization.Json.JsonDataContract, System.Runtime.Serialization.XmlWriterDelegator, System.Object,
Perhaps some traditional shenanigans regarding time zone settings on the test machines?
cc @ahsonkhan @steveharter
That looks like it came from my dotnet/runtime#737, I'm pretty sure I know what the problem is. The odd thing is that the CI was green when the PR was merged and it's not very clear why.
Looks like the same CI (Libraries Build Windows_NT x86 Release) failed in my PR dotnet/runtime#842
https://helix.dot.net/api/2019-06-17/jobs/068da4f0-9282-4e18-a1de-c2baaecf32b0/workitems/System.Runtime.Serialization.Json.Tests/console
Fatal error. 0xC0000005
at DynamicClass.WriteTypeWithDateTimeOffsetTypePropertyToJson(System.Runtime.Serialization.XmlWriterDelegator, System.Object, System.Runtime.Serialization.Json.XmlObjectSerializerWriteContextComplexJson, System.Runtime.Serialization.ClassDataContract, System.Xml.XmlDictionaryString[])
at System.Runtime.Serialization.Json.JsonClassDataContract.WriteJsonValueCore(System.Runtime.Serialization.XmlWriterDelegator, System.Object, System.Runtime.Serialization.Json.XmlObjectSerializerWriteContextComplexJson, System.RuntimeTypeHandle)
at System.Runtime.Serialization.Json.JsonDataContract.WriteJsonValue(System.Runtime.Serialization.XmlWriterDelegator, System.Object, System.Runtime.Serialization.Json.XmlObjectSerializerWriteContextComplexJson, System.RuntimeTypeHandle)
That looks like it came from my dotnet/runtime#737, I'm pretty sure I know what the problem is. The odd thing is that the CI was green when the PR was merged and it's not very clear why.
The reason why your PR was green is because of the current state we’re in.
- In order to achieve building live live, we needed to disable running the libraries tests on coreclr PRs.
- The only pipeline that runs libraries tests is runtime-libraries which is conditioned to only run when the change includes a change to src/libraries/*. Since your change only touched coreclr, it didn’t run in your PR.
I’m working on fixing this and moving to a single pipeline that always run and libraries tests should run always when coreclr or libraries are touched.
CoreCLR Test Run Windows_NT arm legs are failing 100% for all PRs currently (https://github.com/dotnet/runtime/issues/1097)
dotnet/runtime#129 Non-deterministic failure hit by CoreCLR tests: Assert failure(PID 2664 [0x00000a68], Thread: 3612 [0x0e1c]): pMethodDesc->GetCallCounter()->IsCallCountingEnabled(pMethodDesc)
Updated issue for failure in CoreCLR Pri0 Test Run Windows_NT x64 checked tests timing out:
cmdLine:C:\h\w\A995095C\w\B9DA09DC\e\tracing\eventpipe\providervalidation\providervalidation\providervalidation.cmd Timed Out
Test Harness Exitcode is : -100
To run the test:
> set CORE_ROOT=C:\h\w\A995095C\p
> C:\h\w\A995095C\w\B9DA09DC\e\tracing\eventpipe\providervalidation\providervalidation\providervalidation.cmd
Expected: True
Actual: False
cc: @josalem is working on fixing it.
https://github.com/dotnet/runtime/issues/2209: Unable to pull image mcr.microsoft.com/...
A lot of PRs are failing with:
Unhandled exception. System.IO.FileLoadException: Could not load file or assembly 'System.Runtime.CompilerServices.Unsafe, Version=5.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The located assembly's manifest definition does not match the assembly reference. (0x80131040 (FUSION_E_REF_DEF_MISMATCH))\nFile name: 'System.Runtime.CompilerServices.Unsafe, Version=5.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
https://github.com/dotnet/runtime/pull/2344 is reverting the change that introduced the problem.
I just saw an issue related to helix in one of my PRs:
https://github.com/dotnet/core-eng/issues/8694
Adding to the description.
It is not convenient to keep updating this issue with all intermittent test failures hit by the CI. I have started marking issues that are intermittently causing CI failures with blocking-clean-ci label, This label did exist in the repo, but it was not used for a while - time to start using it again.
Query: https://github.com/dotnet/runtime/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+label%3Ablocking-clean-ci+
It is not convenient to keep updating this issue with all intermittent test failures hit by the CI
Agree. Going forward I would prefer this be more of a status page for the repository. A place to visit to quickly check if you're running into a known issue and link to a place to find more information.
I have started marking issues that are intermittently causing CI failures with blocking-clean-ci label, This label did exist in the repo, but it was not used for a while - time to start using it again.
+1
Link dotnet/runtime#32835
Unpinning for a bit
@danmosemsft why? This is pinned so that devs can find active infra issues easily.
Because we can only pin 3 issues and I added a new one. Which do you want to drop? :)
FWIW, I find the permanently pinned issues distracting. I am actively forcing myself to avoid clicking on the "x" button because it would unpin the issue for everybody. I have done it several times by accident. Muscle memory: you see "x" next to a thing that you do not want to see anymore, so you automatically click it to make it go away. I wish github allowed me to hide the pinned issues that I have seen hundred times already.
I hear you - I don't know we have any better way to communicate with the community. Unless we add something to the top of the readme, which may not be noticed.
I am wondering who is the target audience for the pinned issues. The pinned issues communicate the following currently:
- We have 12 overarching themes that we are working on in .NET 5. Should this rather be mentioned next to the roadmap link in the readme?
- We have 30 different sources of CI and official build flakiness
- We are using 5.0 as the milestone for .NET 5.0 issues
- We are going to rename the test directory in two weeks
Are these the most important things we want everybody in the community to know?
😁 Maybe the stickies should only be for announcements (sticks around for a week or two only) and anything else should be linked from the readme. My guess is that nobody reads the readme after they've read it once, of course.
Looks like outerloop job is not in a good shape right now (pipeline), all Linux_musl x64/arm64 send tests are failing with:
2020-08-22T21:30:49.2246346Z Uploading payloads for Job on (Alpine.312.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:alpine-3.12-helix-20200602002622-e06dc59...
2020-08-22T21:30:49.2279177Z /__w/1/s/.packages/microsoft.dotnet.helix.sdk/5.0.0-beta.20407.3/tools/Microsoft.DotNet.Helix.Sdk.MonoQueue.targets(47,5): error : Correlation Payload '/__w/1/s/artifacts/tests/coreclr/Linux_musl.x64.Checked/Tests/Core_Root/' not found. [/__w/1/s/src/coreclr/tests/helixpublishwitharcade.proj]
2020-08-22T21:30:49.2404592Z ##[error].packages/microsoft.dotnet.helix.sdk/5.0.0-beta.20407.3/tools/Microsoft.DotNet.Helix.Sdk.MonoQueue.targets(47,5): error : (NETCORE_ENGINEERING_TELEMETRY=Build) Correlation Payload '/__w/1/s/artifacts/tests/coreclr/Linux_musl.x64.Checked/Tests/Core_Root/' not found.
I have not found a separate issue about that, could somebody take a look?