runtime
runtime copied to clipboard
Tracking issue for CI build timeouts
{
"ErrorMessage" : "##[error]The operation was canceled.",
"BuildRetry": false
}
See https://github.com/dotnet/arcade/issues/11072 for more context.
Report
| Build | Definition | Step Name | Console log | Pull Request |
|---|---|---|---|---|
| 534201 | dotnet/runtime | Send tests to Helix (Windows) | Log | dotnet/runtime#97244 |
| 2358103 | dotnet-runtime | Build product | Log | |
| 534171 | dotnet/runtime | Build CoreCLR Runtime | Log | dotnet/runtime#96386 |
| 2358024 | dotnet-runtime | Build workload artifacts | Log | |
| 2358017 | dotnet-runtime | Send job to Helix (Windows) | Log | |
| 534048 | dotnet/runtime | Build CoreCLR Runtime | Log | dotnet/runtime#97072 |
| 533692 | dotnet/runtime | Send to Helix | Log | dotnet/runtime#95758 |
| 2357687 | dotnet-runtime | Build product | Log | |
| 533488 | dotnet/runtime | Build Tests | Log | dotnet/runtime#96678 |
| 533432 | dotnet/runtime | Build native test components | Log | dotnet/runtime#97077 |
| 533401 | dotnet/runtime | Build product | Log | dotnet/runtime#97212 |
| 533135 | dotnet/runtime | Build | Log | dotnet/runtime#97075 |
| 533061 | dotnet/runtime | Build | Log | dotnet/runtime#97202 |
| 533081 | dotnet/runtime | Build product | Log | dotnet/runtime#97182 |
| 533018 | dotnet/runtime | Build product | Log | dotnet/runtime#96807 |
| 532656 | dotnet/runtime | Build product | Log | dotnet/runtime#97185 |
| 532798 | dotnet/runtime | LLVM AOT compile CoreCLR tests | Log | dotnet/runtime#97189 |
| 532750 | dotnet/runtime | Build native test components | Log | dotnet/runtime#97129 |
| 532726 | dotnet/runtime | Send to Helix | Log | dotnet/runtime#97075 |
| 532677 | dotnet/runtime | Build native test components | Log | dotnet/runtime#97090 |
| 532618 | dotnet/runtime | Build | Log | dotnet/runtime#96826 |
| 2357085 | dotnet-runtime | Build product | Log | |
| 532529 | dotnet/runtime | Build CoreCLR Runtime | Log | dotnet/runtime#97114 |
| 532532 | dotnet/runtime | Build product | Log | dotnet/runtime#95232 |
| 532437 | dotnet/runtime | Generate AOT offsets | Log | dotnet/runtime#97159 |
| 532325 | dotnet/runtime | Send to Helix | Log | dotnet/runtime#95758 |
| 531991 | dotnet/runtime | Build native test components | Log | dotnet/runtime#97087 |
| 531999 | dotnet/runtime | Run Trimming Tests | Log | dotnet/runtime#96807 |
| 2356757 | dotnet-runtime | Generate AOT offsets | Log | |
| 2356610 | dotnet-runtime | Send job to Helix (Unix) | Log | |
| 531705 | dotnet/runtime | Build CoreCLR Runtime | Log | |
| 531697 | dotnet/runtime | Build product | Log | |
| 531606 | dotnet/runtime | Build native test components | Log | |
| 531515 | dotnet/runtime | LLVM AOT compile CoreCLR tests | Log | dotnet/runtime#97076 |
| 2356425 | dotnet-runtime | Send job to Helix (Unix) | Log | |
| 531484 | dotnet/runtime | Build product | Log | dotnet/runtime#97114 |
| 531459 | dotnet/runtime | Prepare TestHost with runtime CoreCLR | Log | dotnet/runtime#97090 |
| 2356407 | dotnet-runtime | Build product | Log | |
| 531361 | dotnet/runtime | Send tests to Helix (Windows) | Log | dotnet/runtime#97130 |
| 2356226 | dotnet-runtime | Send job to Helix (Unix) | Log | |
| 531205 | dotnet/runtime | Send to Helix | Log | dotnet/runtime#95758 |
| 2356184 | dotnet-runtime | Send job to Helix (Windows) | Log | |
| 2356223 | dotnet-runtime | Build product | Log | |
| 531058 | dotnet/runtime | Build Tests | Log | dotnet/runtime#97118 |
| 530819 | dotnet/runtime | Build product | Log | dotnet/runtime#97104 |
| 530821 | dotnet/runtime | Build product | Log | dotnet/runtime#97104 |
| 530579 | dotnet/runtime | Build CoreCLR Runtime | Log | dotnet/runtime#88167 |
| 530528 | dotnet/runtime | Restore and Build Product | Log | dotnet/runtime#90505 |
| 530465 | dotnet/runtime | Build CoreCLR Runtime | Log | dotnet/runtime#97052 |
| 530438 | dotnet/runtime | Build product | Log | dotnet/runtime#97011 |
| 2355601 | dotnet-runtime | Send job to Helix (Unix) | Log | |
| 530184 | dotnet/runtime | Build product | Log | dotnet/runtime#96763 |
| 530027 | dotnet/runtime | Build product | Log | |
| 530017 | dotnet/runtime | Build product | Log | |
| 529925 | dotnet/runtime | Build product | Log | dotnet/runtime#97079 |
| 2355481 | dotnet-runtime | Send job to Helix (Unix) | Log | |
| 529815 | dotnet/runtime | Build CoreCLR Runtime | Log | |
| 2355307 | dotnet-runtime | Download IntermediateArtifacts | Log | |
| 529747 | dotnet/runtime | LLVM AOT compile CoreCLR tests | Log | dotnet/runtime#97076 |
| 529697 | dotnet/runtime | Build Tests | Log | dotnet/runtime#95980 |
| 529637 | dotnet/runtime | Send tests to Helix (Unix) | Log | |
| 529699 | dotnet/runtime | Build product | Log | dotnet/runtime#95980 |
| 2355305 | dotnet-runtime | Send job to Helix (Windows) | Log | |
| 2355308 | dotnet-runtime | Send job to Helix (Windows) | Log | |
| 2355312 | dotnet-runtime | Send job to Helix (Windows) | Log | |
| 529666 | dotnet/runtime | Send to Helix | Log | |
| 529626 | dotnet/runtime | Build native test components | Log | |
| 2355315 | dotnet-runtime | Build product | Log | |
| 2355311 | dotnet-runtime | Build CoreCLR Runtime | Log | |
| 529634 | dotnet/runtime | Build product | Log | |
| 2355267 | dotnet-runtime | Send job to Helix (Unix) | Log | |
| 529597 | dotnet/runtime | Build CoreCLR Runtime | Log | dotnet/runtime#97064 |
| 2355245 | dotnet-runtime | Send job to Helix (Windows) | Log | |
| 529434 | dotnet/runtime | Send tests to Helix (Unix) | Log | |
| 529450 | dotnet/runtime | Build Tests | Log | dotnet/runtime#97053 |
| 529452 | dotnet/runtime | Build product | Log | dotnet/runtime#97053 |
| 529381 | dotnet/runtime | Build product | Log | dotnet/runtime#96650 |
| 529303 | dotnet/runtime | Build product | Log | dotnet/runtime#96808 |
| 529225 | dotnet/runtime | Send to Helix | Log | |
| 529237 | dotnet/runtime | Build product | Log | dotnet/runtime#97047 |
| 529020 | dotnet/runtime | Send tests to Helix (Unix) | Log | |
| 2354830 | dotnet-runtime | Send job to Helix (Windows) | Log | |
| 529070 | dotnet/runtime | Send to Helix | Log | |
| 529009 | dotnet/runtime | Build native test components | Log | |
| 2354813 | dotnet-runtime | Build product | Log | |
| 2354675 | dotnet-runtime | Generate AOT offsets | Log | |
| 528584 | dotnet/runtime | Build CoreCLR Runtime | Log | dotnet/runtime#96676 |
| 528546 | dotnet/runtime | Build product | Log | dotnet/runtime#88167 |
| 528325 | dotnet/runtime | Send tests to Helix (Unix) | Log | |
| 528411 | dotnet/runtime | Build product | Log | dotnet/runtime#97023 |
| 2354374 | dotnet-runtime | Send job to Helix (Windows) | Log | |
| 2354377 | dotnet-runtime | Restore and Build Product | Log | |
| 528321 | dotnet/runtime | Build CoreCLR Runtime | Log | |
| 528320 | dotnet/runtime | Build CoreCLR Runtime | Log | |
| 528099 | dotnet/runtime | Build native test components | Log | dotnet/runtime#96960 |
| 528107 | dotnet/runtime | Build product | Log | dotnet/runtime#97012 |
| 2354095 | dotnet-runtime | Build product | Log | |
| 527910 | dotnet/runtime | Send tests to Helix (Windows) | Log | dotnet/runtime#96650 |
| 527349 | dotnet/runtime | Build native test components | Log | dotnet/runtime#96982 |
| 527282 | dotnet/runtime | Build product | Log | |
| Displaying 100 of 842 results |
Summary
| 24-Hour Hit Count | 7-Day Hit Count | 1-Month Count |
|---|---|---|
| 3 | 106 | 842 |
Known issue validation
Build: :mag_right: Result validation: :warning: Provided build not found. Provide a valid build in the "Build: :mag_right:" line. Validation performed at: 11/30/2023 9:39:02 PM UTC
Tagging subscribers to this area: @dotnet/runtime-infrastructure See info in area-owners.md if you want to be subscribed.
Issue Details
{
"ErrorMessage" : "The operation was canceled.",
"BuildRetry": false
}
See https://github.com/dotnet/arcade/issues/11072 for more context.
| Author: | jkotas |
|---|---|
| Assignees: | - |
| Labels: |
|
| Milestone: | - |
#82476 can be ignored, it's intentionally running tests in a slower mode to flush out failures
I have noticed that the osx-x64 Release NativeAOT timeouted in my PR #85366, but from its log, it is obvious that it has completed running (https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_apis/build/builds/252395/logs/1833). The leg took few seconds over two hours. Maybe we could bump the timeout just a bit.
I have noticed that the osx-x64 Release NativeAOT timeouted in my PR https://github.com/dotnet/runtime/pull/85366, but from its log, it is obvious that it has completed running (https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_apis/build/builds/252395/logs/1833)
The log shows that the leg was waiting for 1+ hour for a Helix machine. This happens when we are sending more work to a Helix queue than what it has capacity for. Increasing the timeouts is not going to fix it.
Ah, I can see it now, I've missed that.
Looking at this, our queue depth is not looking great for OSX 12.

Should we be trying to spill some work to 13 and 11?
cc: @dotnet/runtime-infrastructure @dotnet/dnceng
Hmm, These are the largest work producers:

With PRs like https://github.com/dotnet/runtime/pull/85389 https://github.com/dotnet/runtime/pull/85343 being a ton of the work.

The runtime time distribution seems to be consistent over the last couple weeks. We are 263043/307320 workitems sent over the last 3 days to the queue. That being said. It looks like in terms of workitem compute time, ~it looks like SDK is sending very long lived workitems~ what makes SDK take so much compute time is the different branches.
The following two CI legs have been failing due to timeouts for a at least two weeks:
Libraries Test Run release mono linux x64 DebugLibraries Test Run release coreclr linux x64 Debug
@vcsjones is facing it in https://github.com/dotnet/runtime/pull/87099 as well.
Is anyone investigating this issue?
Seeing the same legs time out on as @adamsitnik, too (Libraries Test Run release {mono,coreclr} linux x64 Debug)
I see something like this in both of the console logs:
Sending Job to (Centos.8.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:centos-stream8-helix...
Sent Helix Job; see work items at https://helix.dot.net/api/jobs/f0271bc6-76fb-445b-a33f-c02c71caa5fb/workitems?api-version=2019-06-17
Sending Job to (AlmaLinux.8.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:almalinux-8-helix-amd64...
Sent Helix Job; see work items at https://helix.dot.net/api/jobs/c66c9c1c-8014-44d0-a94d-9aa1697c740d/workitems?api-version=2019-06-17
Sending Job to (Debian.11.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:debian-11-helix-amd64...
Sent Helix Job; see work items at https://helix.dot.net/api/jobs/16e4db17-152c-4b75-af4a-d6e4e924cb10/workitems?api-version=2019-06-17
Sending Job to Ubuntu.1804.Amd64.Open...
Sent Helix Job; see work items at https://helix.dot.net/api/jobs/a7884197-1f40-4f66-bcdd-ae3b24af1100/workitems?api-version=2019-06-17
Waiting for completion of job a7884197-1f40-4f66-bcdd-ae3b24af1100 on Ubuntu.1804.Amd64.Open (Details: https://helix.dot.net/api/jobs/a7884197-1f40-4f66-bcdd-ae3b24af1100/details?api-version=2019-06-17 )
Waiting for completion of job c66c9c1c-8014-44d0-a94d-9aa1697c740d on (AlmaLinux.8.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:almalinux-8-helix-amd64 (Details: https://helix.dot.net/api/jobs/c66c9c1c-8014-44d0-a94d-9aa1697c740d/details?api-version=2019-06-17 )
Waiting for completion of job f0271bc6-76fb-445b-a33f-c02c71caa5fb on (Centos.8.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:centos-stream8-helix (Details: https://helix.dot.net/api/jobs/f0271bc6-76fb-445b-a33f-c02c71caa5fb/details?api-version=2019-06-17 )
Waiting for completion of job 16e4db17-152c-4b75-af4a-d6e4e924cb10 on (Debian.11.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:debian-11-helix-amd64 (Details: https://helix.dot.net/api/jobs/16e4db17-152c-4b75-af4a-d6e4e924cb10/details?api-version=2019-06-17 )
Job f0271bc6-76fb-445b-a33f-c02c71caa5fb on (Centos.8.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:centos-stream8-helix is completed with 243 finished work items.
Job 16e4db17-152c-4b75-af4a-d6e4e924cb10 on (Debian.11.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:debian-11-helix-amd64 is completed with 243 finished work items.
Job a7884197-1f40-4f66-bcdd-ae3b24af1100 on Ubuntu.1804.Amd64.Open is completed with 243 finished work items.
##[error]The operation was canceled.
On both legs, it looks like it's the (AlmaLinux.8.Amd64.Open) job that didn't finish.
I wonder if it is the same distro on other runs, too.
There's another issue about the recent AlmaLinux timeouts https://github.com/dotnet/runtime/issues/87667 - the AlmaLinux leg has been removed by https://github.com/dotnet/runtime/pull/87668
This issue seems to match also generic test timeout failures, such as
System.Net.Http.HttpRequestException : Requesting HTTP version 3.0 with version policy RequestVersionExact while unable to establish HTTP/3 connection.
---- System.OperationCanceledException : The operation was canceled.
is that the intent here?
is that the intent here?
It is not the indent. I have tweaked the matching string.
Removing Known Build Error label because it is marking Build Analysis green. We want developers either to wait until infra timeout issue is resolved and rerun their tests or to use their judgement to use the escape path to merge.
Known Build Error label is useful to get stats on the infra issues such as timeout. On the other hand, it allows PRs to get merged whle tests are not fully run. So, it is something we need to think about how we should handle such cases.
Known Build Error label is useful to get stats on the infra issues such as timeout
This was the sole purpose of this tracking issue. If we do not have it marked as Known Build Error, it can be closed.