runtime
runtime copied to clipboard
System.IO.Tests crash in CI (Linux arm64)
Discovering: System.IO.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.IO.Tests (found 736 of 744 test cases)
Starting: System.IO.Tests (parallel test collections = on [2 threads], stop on fail = off)
./RunTests.sh: line 180: 20 Killed "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.Tests.runtimeconfig.json --depsfile System.IO.Tests.deps.json xunit.console.dll System.IO.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/root/helix/work/workitem/e
----- end Fri Mar 29 11:20:29 UTC 2024 ----- exit code 137 ----------------------------------------------------------
Build Information
Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=623676 Build error leg or test failing: System.IO.Tests.WorkItemExecution Pull request: https://github.com/dotnet/runtime/pull/100433
Error Message
Fill the error message using step by step known issues guidance.
{
"ErrorMessage": ["arm64", "System.IO.Tests", "Killed", "-- exit code 137 --"],
"ErrorPattern": "",
"BuildRetry": false,
"ExcludeConsoleLog": false
}
Known issue validation
Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=623676
Error message validated: [arm64 System.IO.Tests Killed -- exit code 137 --]
Result validation: :white_check_mark: Known issue matched with the provided build.
Validation performed at: 3/29/2024 2:52:42 PM UTC
Report
| Build | Definition | Test | Pull Request |
|---|---|---|---|
| 665654 | dotnet/runtime | System.IO.Tests.WorkItemExecution | dotnet/runtime#101843 |
| 644724 | dotnet/runtime | System.IO.Tests.WorkItemExecution | dotnet/runtime#100706 |
| 644673 | dotnet/runtime | System.IO.Tests.WorkItemExecution | dotnet/runtime#101082 |
Summary
| 24-Hour Hit Count | 7-Day Hit Count | 1-Month Count |
|---|---|---|
| 0 | 0 | 3 |
Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.
137 means out of memory. The tests started to fail not only in main but also in older branches where we have not touched the code at all: https://github.com/dotnet/runtime/issues/100558
@dotnet/area-infrastructure-libraries Is it possible that the test VMs simply have less memory available now?
I don't think that we have access to that information for a Helix test client. Might make sense to print some diagnostics in the RunTests.sh/cmd script, i.e. available RAM and disk space.
Is it possible that the test VMs simply have less memory available now?
@adamsitnik I'd be surprised if something like that happened, but we can double check: @dotnet/dnceng do you know?
The thing is, this OOM failure is only happening in System.IO and System.IO.Net5Compat . I am pretty sure I don't see it anywhere else.
~~One thing that could help you is that this failure is also happening in 6.0 and 8.0, meaning something got backported, so that could help you narrow down the checkins, as we don't modify System.IO often.~~ Nevermind, you already answered that above.
This is an intermittent issue, so maybe widen up the dates a bit more? When was the last time a System.IO change happened in servicing before April?
The failure was most likely triggered by Linux kernel update, docker container update or test infra update. These updates are rolled out regularly in the background. I do not think it is a good use of time to try to find the exact update that triggered this failure months ago. We won't be able to do much with that information.
The failure is likely triggered by a test that consumes too many resources. It does not have to be direct memory use. For example, the test can be creating too many file handles that manifests as 137. I think we should try to find the offending test or tests, e.g. by trying to reproduce the failure with verbose logging.
Failed for below leg in runtime-coreclr libraries-pgo/20240810.1
net9.0-linux-Release-arm64-fullpgo_random_gdv-(Ubuntu.2004.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-20.04-helix-arm64v8
Starting: System.IO.Tests (parallel test collections = on [2 threads], stop on fail = off)
./RunTests.sh: line 182: 26 Killed "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.Tests.runtimeconfig.json --depsfile System.IO.Tests.deps.json xunit.console.dll System.IO.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE