runner-images
runner-images copied to clipboard
Windows runners are consistently extremely slow compared to Linux and macOS
Description
In the CUE project we are seeing incredibly slow run times on Windows 2022 runners.
For a recent example see https://github.com/cue-lang/cue-trybot/actions/runs/4477594562/.
Roughly speaking, here are the numbers we are seeing averaged out across 20-30 builds per day.
OS | actions/checkout |
go test (cache hit) |
---|---|---|
ubuntu-22.04 |
2-3 secs | 8 secs |
macOS-11 |
5 secs | 9 secs |
windows-2022 |
15-20 secs | 120+ secs |
Notice that actions/checkout
is consistently slower on Windows. Yes, there is a network access element to this, but it is minimal.
The go test
comparison is the cleanest comparison. As indicated in the column heading, these are the timings for a full cache hit. i.e. no network access required, no rebuilds required, no tests actually running (because they hit the test cache). So the go test
command is purely a function of disk access and CPU. For this command, Windows is consistently 15 times slower than Linux. macOS is, pleasingly, comparable with Linux.
Platforms affected
- [ ] Azure DevOps
- [X] GitHub Actions - Standard Runners
- [ ] GitHub Actions - Larger Runners
Runner images affected
- [ ] Ubuntu 18.04
- [ ] Ubuntu 20.04
- [ ] Ubuntu 22.04
- [ ] macOS 11
- [ ] macOS 12
- [X] Windows Server 2019
- [X] Windows Server 2022
Image version and build link
For windows-2019
:
Current runner version: '2.303.0'
Operating System
Microsoft Windows Server [2](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257887#step:1:2)019
10.0.1776[3](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257887#step:1:3)
Datacenter
Runner Image
Image: windows-2019
Version: 2023031[4](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257887#step:1:4).1
Included Software: https://github.com/actions/runner-images/blob/win19/20230314.1/images/win/Windows2019-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/win19%2F20230314.1
Runner Image Provisioner
2.0.12[7](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257887#step:1:8).1
For windows-2022
:
Current runner version: '2.303.0'
Operating System
Microsoft Windows Server [2](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257680#step:1:2)022
10.0.20[3](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257680#step:1:3)[4](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257680#step:1:4)8
Datacenter
Runner Image
Image: windows-2022
Version: 20230314.1
Included Software: https://github.com/actions/runner-images/blob/win22/20230314.1/images/win/Windows2022-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/win22%2F20230314.1
Runner Image Provisioner
2.0.12[7](https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477569194/jobs/7869257680#step:1:8).1
Is it regression?
Unclear
Expected behavior
Windows runners to be comparable in terms of speed to Linux and macOS for actions/checkout
and go test
steps.
Actual behavior
Windows runners consistently taking 15 times as long as Linux and macOS for CPU and disk-intensive commands.
Repro steps
The CUE repo itself is quite involved. So as a proxy for something that is relatively CPU and disk intensive we have created a slimmed down repo using actions/checkout
.
https://github.com/myitcvscratch/slow-windows-actions
See the most recent run for results:
https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4477601338
Looking at averages of this setup across a number of runs we see similar figures for actions/checkout
to those seen in our CUE setup:
Runner | actions/checkout |
---|---|
ubuntu-20.04 |
3 secs |
ubuntu-22.04 |
3 secs |
macos-11 |
5 secs |
macos-12 |
5 secs |
windows-2019 |
20 secs |
windows-2022 |
20 secs |
So whilst this doesn't include the go test
step (because getting a warm cache is a tricky step to reproduce) the use of actions/checkout
is a sufficiently good proxy to show the problem.
I suspect it might be Windows Defender in action. can you try the following (in order to check whether Windows Defender disabling can help) ?
- run: Set-MpPreference -DisableRealtimeMonitoring $true
shell: powershell
Here are many other people similarly seeing that actions/checkout
is very slow on Windows: https://github.com/actions/checkout/issues/1150
@ilia-shipitsin
I suspect it might be Windows Defender in action. can you try the following (in order to check whether Windows Defender disabling can help) ?
See commit https://github.com/myitcvscratch/slow-windows-actions/commit/8b71de30da15b600ca0d320f9415b7092e16ea6d which resulted in run https://github.com/myitcvscratch/slow-windows-actions/actions/runs/4480846656/jobs/7876717612. It has basically no effect.
I also tried making this change in the CUE project in https://review.gerrithub.io/c/cue-lang/cue/+/551317. That resulted in https://github.com/cue-lang/cue-trybot/actions/runs/4476128503 which again showed no effect.
So assuming my testing is valid, turning off Windows Defender does not appear to have any effect in our case.
Here are many other people similarly seeing that
actions/checkout
is very slow on Windows: actions/checkout#1150
Thanks, @mvdan.
Just to emphasise however that the use of actions/checkout
in my example above is a proxy for the much bigger problem we are seeing in the go test
step in the CUE project. It's very likely not the best proxy, but hopefully good enough.
The reason I flag this is that ultimately we will consider this issue "fixed" when the go test
step is "fast" and not just an improvement in the actions/checkout
step.
I've also tried to disable Windows Defender, no significant difference so far
https://github.com/ilia-shipitsin/slow-windows-actions/actions/runs/4483502669/jobs/7882868552
@myitcv , as for go test
task performance degradation, do you see appropriate performance on standalone (not related to GH actions) Win 2019/2022 server ?
I mean, is degradation runner specific
or platform specific
@myitcv , as for
go test
task performance degradation, do you see appropriate performance on standalone (not related to GH actions) Win 2019/2022 server ?I mean, is degradation
runner specific
orplatform specific
We only have the numbers from GitHub actions workflow runs.
we need to narrow it, whether standalone Windows server behaves the same slow or not
Duplicate of https://github.com/actions/runner-images/issues/5166
@mikhailkoliada - isn't #5166 demonstrating a slowdown between 2019 and 2022?
The numbers we are seeing show consistent slowness on both 2019 and 2022.
Therefore I'm not clear this is a duplicate.
side note, I did some investigation on "checkout slowness", it looks like there's some delay between git and automation task. git itself takes 2-3 seconds (I put commands into cmd and wrapped with Measure-Command { ... }
I can beleive that agent communication could add 15-20 sec, but it does not look like a root cause for go test
slowness.
Hey! let me chat with the team and see what is going on and see if we need a separate wrap up ticket for Windows Perf (or I will get someone to re-open this one!)
@mvdan @myitcv we think this may be the same as the checkout issue. Given that 'hunch' we will tackle that tracking our progress here: https://github.com/actions/checkout/issues/1186
We will keep this issue open until we can validate if it is a dupe and go from there (or start working on a new root cause for this after I guess 😱 let's hope not!)
@nebuk89 - thanks for looking into this and the detailed update. Much appreciated.
I would like to add that the situation seems to be even worse with larger runners. We have to use 8-core Windows runners (the regular one runs out of heap space) and they are only twice faster compared to the 2-core machines.
On Linux the 8-core vs 2-core performance is as expected (though 8-core machines are still slower in some aspects; actions like checkout take 4-times longer etc.) and the steps that can be parallelised (the build step, for example) in fact take cca. 4 times less time. On windows that ratio is around 2.
providing some data here as a CI tool developer:
Windows OS tends to prolong a ~5 mins done on MacOS and Ubuntu to 1HRS+ with unknown cause (Ref: https://github.com/corygalyna/AutomataCI/actions/runs/7083465554/job/19275876384) as of the date of this posting.
This is observed since November 2023 and now it is incrementally getting unusable. Some CI tasks are in serial execution (ref: CI - Materialize) and all the commands are running via local PowerShell script. The timing statistics are making no sense:
- For Materialize CI job, Rust compiler reported only used ~2 seconds for a single, host-oriented build but the job itself took 15mins for completion (where are the 14+mins being used at?)
The delay is usually at the start of a job or at the end of a job. Can't debug further without some kind of diagnostic guidance.
Update (Dec 7, 2023 08:32 UTC):
I think the GitHub solved something: https://github.com/corygalyna/AutomataCI/actions/runs/7125351326/job/19401138941
Suddenly, it's blazing fast.
Update (Dec 8, 2023 00:14 UTC):
Nevermind. It's back to its sluggishness: https://github.com/corygalyna/AutomataCI/actions/runs/7129085125/job/19412375232
Should be server related problem.
Same issues here -https://github.com/1kastner/conflowgen/actions/runs/7202425735/job/19620518126 and https://github.com/1kastner/conflowgen/actions/runs/7202425735/job/19620518686.
It takes ages for the action to start. Before the test suite finishes (on my local laptop a matter of few minutes), it is aborted because I assume my time budget is consumed.
This is also the case when building NuGet packages (the .NET/C# ecosystem). The process takes 57 seconds for my small project on windows-latest
and 13 seconds on ubuntu-latest
.
I switched from windows-2022
to windows-2019
and the speed increased tremendously.
I switched from
windows-2022
towindows-2019
and the speed increased tremendously.
Any chance we can get Windows XP runners?
I switched from
windows-2022
towindows-2019
and the speed increased tremendously.
For completeness, this had no impact in our situation.
Switching to windows-2019
and disabling Windows Defender did not improve performance in my use-case.
Also having issues with Windows runners, they are extremely slow compared to their MacOS and Ubuntu counterparts:
(Can't link as the actions are in a private repo)
Our tests are now running at 50+mins compared to Linux < 10mins, moving to windows-2019 didn't fix anything, neither did disabling windows defender.
We are moving away from GitHub managed runners for these, moving to in-house self-hosted runners.
For what it's worth, over at Astral we switched to using a Dev Drive with ReFS to great benefit https://github.com/astral-sh/uv/pull/3522 (the GitHub Windows Runners are still the bane of my existence though)
End of July 2024 - any progress on this oldish Windows anomaly?
My partner and I test our work on Ubuntu, MacOS, and Windows (significantly slower) to make sure we can run on all 3 environments. We are lucky that our runs are short at the moment (< 8 minutes) but expect project growth and therefore longer run times in the future.