Checkout suddenly much slower on windows environment
Moved from https://github.com/actions/runner-images/issues/7180
Description
Since a few days, the duration of the actions/checkout@v3 steps on windows-2019 have dramatically increased.
This behavior is seen on all my repos (all private). Below a table showing an example of before/after.
| Checkout step | Feb 17th | Feb 21st |
|---|---|---|
| 1 | 13s |
1m35s |
| 2 | 8s |
47s |
The result is a huge increase in build (and billable) time.


The github status page does show some issues around this time frame, but these were all solved:

Platforms affected
- [ ] Azure DevOps
- [X] GitHub Actions - Standard Runners
- [ ] GitHub Actions - Larger Runners
Runner images affected
- [ ] Ubuntu 18.04
- [ ] Ubuntu 20.04
- [ ] Ubuntu 22.04
- [ ] macOS 11
- [ ] macOS 12
- [X] Windows Server 2019
- [ ] Windows Server 2022
Image version and build link
Current runner version: '2.301.1'
Operating System
Microsoft Windows Server 2019
10.0.17763
Private repo
Is it regression?
Yes, sorry private repos
Expected behavior
The build times should be fairly constant.
Actual behavior
Build times explode. Burning down our build minutes too fast.
Repro steps
Compare build times on any windows environment from before Feb 18th with today.
I have observed this too. Example job: https://github.com/Tyrrrz/CliWrap/actions/runs/4271628026
- Windows – checkout took 4m9s:

- Ubuntu – checkout took <1s:

- macOS – checkout took 3s:

Note that this behavior is pretty inconsistent, and usually checkout completes pretty fast on Windows. But in the event that it is slow, it's only ever slow on Windows.
I've encountered this and debugged it a bit, by heavily instrumenting the transpiled code in dist/index.js. Turns out that there are multiple calls to PowerShell that make a mess by essentially waiting for dozens of seconds. Here is the example run I analyzed:
- here, 1 minute and 14 seconds are spent,
- here, 11 seconds are spent,
- here, 12 seconds are spent,
- here, 11 seconds are spent,
all on this PowerShell call:
(Get-CimInstance -ClassName Win32_OperatingSystem).caption
totaling 1 minute and 48 whopping seconds just to determine a Windows release that the actions/checkout Action has no need to know.
To make things even worse, the same PowerShell invocation also happens multiple times during the post-action:
I do not really understand what changed between Feb 17th and 21st that would explain this slowdown. There has not been a new actions/checkout release since Jan 5th...
I have briefly seen success trying to hard-code the fix of https://github.com/sindresorhus/windows-release/pull/18 into dist/index.js but it does not seem to work when I undelete all the core logic. sigh
At this point, I have to tend to other things, but I thought I'd leave my current findings here in case somebody else can take over (and try things like overriding os.release() to always return 2022 or something like that, before letting @octokit/endpoint initialize its User-Agent string).
I made an attempt to avoid calling into windows-releases from the checkout action in #1246.
If you want to try you can use
- uses: BrettDong/checkout@octokit
and see if there is any improvement in time stalled between starting the checkout action and performing actual git operations.
@BrettDong excellent! I tested this and the times are back to decent levels: the entire checkout step takes 11 seconds with your PR. Thank you so much!
The fix in #1246 reduces stalled time down to 3 seconds.

During the 3 seconds the workflow is stalled on loading the node.exe from disk to memory for execution. I don't think there is anything I can do to get rid of it.
I have had a similar issue with large runners with slow checkout and cleanup that I reported to GitHub Support. They concluded that it is related this issue, even though I am not completely convinced.
The screenshot from @Tyrrrz earlier in this issue also shows a slow post checkout (cleanup).
Workflow:
To recreate as a minimum solution, I created a new repository with only a single workflow file, spinning up 3 jobs on default and large runners:
name: Demo Slow Checkout
on:
workflow_dispatch:
permissions:
contents: read
jobs:
doSomething:
name: My Job
strategy:
matrix:
os: [windows-latest, windows-latest-8core, windows-latest-16core]
runs-on: ${{ matrix.os }}
steps:
- name: Checkout
uses: actions/checkout@8f4b7f84864484a7bf31766abe9204da3cbe65b3 # v3.5.0
# https://github.com/actions/checkout/releases
with:
# Fetch full depth
fetch-depth: 0
- name: Work for 15 seconds
run: Start-Sleep -Seconds 15
In addition to checkout the job has only one other step, and that is to sleep for 15 seconds.
Results:
The jobs executed 10 times, and shows that:
- Checkout
- Normal time is 19 seconds
- Large runners use more than twice the time (between 47 and 51 seconds)
- Post Checkout
- Normal time is 3 seconds
- Large runners use 15 times more time (between 46 and 49 seconds)
ℹ️ The last row is median and not average, so that any single slow run should interfere with the result.
Findings:
Finding 1:
Every single checkout on the large runners is at least twice as slow as on regular runner, and all the times goes before the actual checkout starts:

Finding 2:
The post checkout (cleanup) is on average 15 times slower as on regular runner, and all the time also goes before any cleanup is started:

Finding 3:
The simple sleep-task on regular runner uses twice the time of the sleep interval. How is it even possible that a sleep for 15 seconds takes almost double the time? This was done with a simple run: Start-Sleep -Seconds 15.

You can build something like https://github.com/BrettDong/Cataclysm-DDA/blob/etw/.github/workflows/etw.yml to collect ETL traces in the runner to diagnose what is happening and spending time during checkout.
@BrettDong
During the 3 seconds the workflow is stalled on loading the
node.exefrom disk to memory for execution. I don't think there is anything I can do to get rid of it.
This seems like an interesting point in and of itself. Because in https://github.com/actions/runner-images/issues/7320 whilst we report actions/checkout issues, we see much bigger problems during workflows which heavily rely on disk access (i.e. cache hits during test runs).
So if I understand your conclusion, the narrowing down of this to being a disk access issue (or something that limits the effective disk access rate), this is consistent with what we are seeing.
I have had a similar issue with large runners with slow checkout and cleanup that I reported to GitHub Support. They concluded that it is related this issue, even though I am not completely convinced.
Same here - we've noticed that large GitHub-managed Windows runners are substantially slower during checkout. This is not a recent regression for us though - they've been (nearly unuseably) slow for months.
| Runner | Checkout time |
|---|---|
| Self hosted Linux | 40s |
small GitHub-managed windows (windows-2022) |
2m30s |
| large GitHub-managed Windows | 12m |
We also have a ticket with GitHub Support, and I've been running experiments for our repo / workflows at https://github.com/openxla/iree/pull/12051.
Great news - build times on windows environment are back to normal! This was fixed in actions/[email protected].
Thanks @BrettDong and @fhammerl

FYI the issue that @ScottTodd and I are seeing on the large Windows managed runners was not fixed by this update. We tested it prior to release at the direction of GitHub support:
- yaml file (same location): https://github.com/ScottTodd/iree/blob/ci-windows-checkout-debug/.github/workflows/ci.yml
- logs: https://github.com/openxla/iree/actions/runs/4610834430
Seems like it may be a separate issue, but just wanted to call it out since these issues seem like they were maybe merged. Seems like this is also something that @Gakk is hitting. My understanding from support is that they're still investigating this other problem. It may be worth opening a separate issue for this or leaving this open.
https://github.com/actions/checkout/pull/1246#issuecomment-1499276093
try: https://github.com/actions/checkout/issues/1186#issuecomment-1484896561
Seems like this is also something that @Gakk is hitting.
@GMNGeoffrey, I have done extensive testing and confirmed that my issues were resolved by actions/checkout version 3.5.1.
Yep, I'm still seeing substantially slower checkouts on large runners (could break that out into a different issue, and we have a support ticket for it). Latest experiments on https://github.com/openxla/iree/pull/12051, logs at https://github.com/openxla/iree/actions/runs/4748455722/jobs/8434667258. Our repo depends on https://github.com/llvm/llvm-project/ (very large) and a few other submodules, and just the checkout takes ~1 minute on small Windows runners but 7+ minutes on large Windows runners. We've tried all sorts of ways to change the git commands used (sparse checkouts, shallow clones, caching of git files, etc.) but can't get past whatever the differences are between the runners themselves.
As mentioned in a comment above, can you try this? https://github.com/BrettDong/Cataclysm-DDA/blob/etw/.github/workflows/etw.yml
I don't have access to larger runners currently to test myself.
I'm still seeing substantially slower checkouts on large runners
Just a wild guess: could it be that large runners have slower D: drives than smaller runners? IIRC the hosted runners specifically have very fast D: drives.
I've just verified that the issue is not with the runners themselves, but rather with actions/checkout. Using just the normal bash commands, I did full checkout with submodules in 1m30s, compared to almost 10m previously. By dropping actions/checkout and actions/cache and rolling our own pretty unsophisticated solutions, I've been able to drop a maximally cached build from 20m (and that's when fetching the cache doesn't just fail entirely) to 9m. That is just really pretty sad
You can collect ETW traces to help diagnose what's happening and taking time during checkout action.
You can collect ETW traces to help diagnose what's happening and taking time during checkout action.
As mentioned in a comment above, can you try this? https://github.com/BrettDong/Cataclysm-DDA/blob/etw/.github/workflows/etw.yml
I don't have access to larger runners currently to test myself.
Yeah, but presumably so can the GitHub engineers who support says are working to fix this. Like, IDK, it kind of seems to me that the people who wrote this code, control these VMs, and whom we are paying for this service, could maybe take a look at the issues with it.
@GMNGeoffrey I would like to encourage you to consider the current macro-economical climate, and also just how large he roadmap is. And please also note how @BrettDong's effort was rewarded: I am sure that you also will get what you want much quicker if you dig a little deeper with those ETW traces. I would even consider to help, but I do not have access to those large runners; You do, though.
~~👋 hey sorry for not seeing this one, we are tracking/providing commentary on our investigation on the windows larger runner checkout here: oops thats the internal issue and no good to anyone~~
Ok that first update was all kinds of wrong let me try again!
Sorry we haven't commented on this ticket still, we are tracking this internally. We are have made some changes to the Windows VM image but only recently and the don't appear to have helped. With everything else going on we have had to put this one aside as well for the last couple of weeks but we are committed to fixing this. I will re-open this ticket as this is linked in the issue we are tracking internally :)
Thanks Ben. Further runs suggest that my switch to use git commands directly instead of actions/checkout was just lucky the first few times (or the computer learned what I was trying to do and put a stop to it :stuck_out_tongue:). Subsequent runs have had similar latency to before the switch, I think (I started hacking together a script to collect statistics for jobs over time, but got side-tracked, so pure anecdata right now). So I'm back to thinking it's the VM+git itself and not the action. I am sort of considering getting tarballs for all of the submodules instead of using git... I'll update if that seems to be faster somehow (which would suggest to me something git-specific and not just IO or network issues)
Thanks @GMNGeoffrey for having a go! (and sorry that the computers are learning 😆 ) Let me know if it's faster and we will hopefully have our focus back on this in the next week or so as things settle (also turns out I am not a maintainer on this repo, I will find someone to re-open this for me :D)
Just a reminder, given the evidence in https://github.com/actions/runner-images/issues/7320, there is almost certainly an underlying issue that is not specific to git at all. Rather, all operations that place high demands on disk (and CPU).
Re-opening at @nebuk89's request, so we can track our efforts externally as we investigate further. Some valuable context built up in this thread 😄 .
I'm not using these runners, but if the OS is showing high CPU on disk access, perhaps its due to a Host Disk Caching setting set by Azure on the VM Disk (see https://learn.microsoft.com/en-us/azure/virtual-machines/disks-performance) as while the Host Disk cache benefits for some modes, it can also add a penalty.
Not sure if it helps - since we are running self-hosted gitlab, but I started looking for solution since our Windows runners are incredibly slow. Simple build jobs (for instance just perform MSBuild on a solution) that finish in less than 1 minute when run manually on the same machine - take over an hour when run as gitlab-runner job. The very same script is executed, no manual deviation in the two procedures. Further potentially helpful details:
- We have runners on virtual machines and on physical ones - makes no difference, both are concerned
- Runners use Windows 10
- There is no load to be seen, not on CPU, nor memory, nor disk, just all silent and idle
- I so far failed to pinpoint the delays to any step. E. g. simply the output of MSBuild is tremendously slowed down. Lines that pass within fractions of a second normally drop in every other minute only
It almost sounds like the runner is maybe slow / blocking reading stderr/out of the console windows of the processes it’s launched, which is blocking the process advancing.
We are discontinuing our use of GH managed windows runners. The costs were already beyond premium/sustainable, and the performance is so poor that the issue compounds out of control. I don't consider this a viable way to run CI for any business.
I can tolerate a lot but not at massively inflated prices.