Windows-Containers icon indicating copy to clipboard operation
Windows-Containers copied to clipboard

the "Extracting" step of Windows containers with Docker still takes an eternity

Open doctorpangloss opened this issue 5 months ago • 5 comments

Describe the bug

~ $ docker run -it mcr.microsoft.com/windows/server:ltsc2022
Unable to find image 'mcr.microsoft.com/windows/server:ltsc2022' locally
ltsc2022: Pulling from windows/server
01c4baad83ab: Extracting [=======================>                           ]  1.421GB/3.014GB
ce7126e7d668: Download complete

It takes like 140 seconds to extract 2GB on a very powerful machine. This is nuts. There must be a better way.

To Reproduce (see above)

Expected behavior Once an image is downloaded, it should be fast to start running it.

Configuration:

  • Edition: Windows Server 2022
  • Base Image being used: mcr.microsoft.com/windows/server
  • Container engine: docker
  • Container Engine version: 28

doctorpangloss avatar Aug 04 '25 22:08 doctorpangloss

Thank you for creating an Issue. Please note that GitHub is not an official channel for Microsoft support requests. To create an official support request, please open a ticket here. Microsoft and the GitHub Community strive to provide a best effort in answering questions and supporting Issues on GitHub.

github-actions[bot] avatar Aug 04 '25 22:08 github-actions[bot]

Thanks for bringing this up. We'll take a look at which steps are taking a long time and try to figure out why that is.

ntrappe-msft avatar Aug 08 '25 22:08 ntrappe-msft

I have experienced very slow image extraction as well.

I don't know if this story would help, but in case it does... I suspect that the built-in Windows file decompression tools are pretty slow for some reason, and if they are being used, they may be a bottleneck.

Many moons ago I ran some experiments to measure the time it took to de-compress zip files on Windows. Originally, I used Expand-Archive, which is the Powershell routine built into Windows. I got fed up and used CMake's built-in tar sub-command to see if it was better... and it was 5+ times faster. No idea why, and not suggesting a specific solution, but perhaps there is an opportunity to find and bake into Docker Windows a better file extraction library...

P-N-L avatar Aug 29 '25 16:08 P-N-L

I don't know if this story would help, but in case it does... I suspect that the built-in Windows file decompression tools are pretty slow for some reason, and if they are being used, they may be a bottleneck. [...] I got fed up and used CMake's built-in tar sub-command to see if it was better... and it was 5+ times faster.

It's more complicated than that. Don't try to use CMake's built-in tar here, it doesn't understand alternate data streams, and it wouldn't create the expected .$wcidirs$ metadata files from directories/symlinks either.

However you are right, extracting the tar file would still be a trivial task, usually. moby is doing a lot sequentially/concurrently here though - gzip decompression, gzip crc32 checksum, sha256 checksum over the tar byte stream, collecting meta data as JSON, passing the tar file to a sub-process over stdin for privilege separation, handling of alternative data streams, and a couple of things more.

One quite obvious trap here is the inter process communication over stdin: Passing the unbuffered os.stdin directly into the tar reader (https://github.com/moby/moby/blob/master/daemon/graphdriver/windows/windows.go#L793, https://github.com/moby/moby/blob/master/daemon/graphdriver/windows/windows.go#L726) results in pretty bad scheduling, especially when an AV comes into play and starts stalling both reads in the original process and file system writes in the other one so the stdin buffer frequently ends up drained/full.

It's quite likely that the responsible developers missed this, because there's a backdoor for development (DOCKER_WINDOWSFILTER_NOREEXEC) that keeps the data in-process and in-stack via go-routines, same as it does by default on non-Windows platforms. And also because this code performs differently depending on the type of CPU / depending on where which process gets scheduled...

Ext3h avatar Sep 22 '25 11:09 Ext3h

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.