buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

Unable to use Buildkit with Windows containers

Open tofflos opened this issue 6 years ago • 83 comments

I'm using the Buildkit version that comes bundled with Docker for Windows 18.06.1 and am experiencing some trouble running it with Windows containers. In the log below you can see a build succeed for a very simple build running without Buildkit and then failing once I enable it. The localized error message "Det går inte att hitta filen" roughly translates to "Unable to find the file". I've had success running Buildkit on the same system when running Linux containers. A minimal project that reproduces the error can be found here test.zip.

PS C:\test> docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:21:34 2018
 OS/Arch:           windows/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.24)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:36:40 2018
  OS/Arch:          windows/amd64
  Experimental:     true
PS C:\test> ls


    Directory: C:\test


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----       2018-09-11     15:38             74 Dockerfile
-a----       2018-09-11     15:39             23 test.txt


PS C:\test> type .\Dockerfile
FROM microsoft/nanoserver:1803
COPY test.txt /test.txt
RUN type test.txt

PS C:\test> $Env:DOCKER_BUILDKIT=0
PS C:\test> docker build -t test .
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM microsoft/nanoserver:1803
 ---> 693ff1719e39
Step 2/3 : COPY test.txt /test.txt
 ---> 3cb8bc9e5e2e
Step 3/3 : RUN type test.txt
 ---> Running in 376f873629fd
This is a test message!Removing intermediate container 376f873629fd
 ---> 0cce47564a2d
Successfully built 0cce47564a2d
Successfully tagged test:latest

PS C:\test> $Env:DOCKER_BUILDKIT=1
PS C:\test> docker build -t test .
[+] Building 0.2s (2/2) FINISHED
 => local://dockerfile (Dockerfile)                                                                                                                                                                                                                                       0.1s
 => => transferring dockerfile: 31B                                                                                                                                                                                                                                       0.0s
 => local://context (.dockerignore)                                                                                                                                                                                                                                       0.1s
 => => transferring context: 2B                                                                                                                                                                                                                                           0.0s
failed to read dockerfile: open C:\ProgramData\Docker\tmp\buildkit-mount977689469\Dockerfile: Det går inte att hitta filen.

tofflos avatar Sep 11 '18 18:09 tofflos

Buildkit is not supported for Windows containers in docker 18.06/18.09

tonistiigi avatar Sep 11 '18 20:09 tonistiigi

Any plans to support it?

gerich-home avatar Jan 18 '19 14:01 gerich-home

If there is no windows container support yet, I think the error message need to be update to define expectation.

quangkieu avatar May 07 '19 06:05 quangkieu

@quangkieu it looks to be described on documentation: https://docs.docker.com/build/buildkit/#getting-started Only supported for building Linux containers

olljanat avatar Jun 01 '19 11:06 olljanat

@olljanat I meant about the error message from the built process.

quangkieu avatar Jun 05 '19 21:06 quangkieu

When is buildkit support coming for windows?

Barsonax avatar Nov 10 '19 12:11 Barsonax

Maybe a better question is what needs to be done/what are the outstanding dependencies?

TBBle avatar Nov 12 '19 01:11 TBBle

Has anyone tried using buildctl on Windows via instructions at https://github.com/moby/buildkit#exploring-dockerfiles with buildkit daemon running in a container? Looks like that might be an alternative until docker build works properly on Windows?

Iristyle avatar Dec 19 '19 22:12 Iristyle

@Iristyle if you read that doc more carefully it also says

the buildkitd daemon is only available for Linux currently.

@Barsonax I'm bit worry about that we will not see Windows containers support ever because there is no Microsoft persons contributin to this project. Hopefully I'm wrong.

olljanat avatar Dec 20 '19 02:12 olljanat

@olljanat well, I'm using LCOW, which hosts a real Linux kernel - so it's a bit of a grey area (and a lot of the docker folks don't seem to know much about in practical terms). I played around a little and I was getting closer to having rootless running per instructions at https://github.com/moby/buildkit/blob/master/docs/rootless.md#about---oci-worker-no-process-sandbox, noting that --privileged is not supported on Windows at all.

I'll update if I'm able to get it going or hit a dead end.

Iristyle avatar Dec 20 '19 03:12 Iristyle

@Iristyle that is probably possible but this issue is about real Windows containers so let's try keep on topic.

olljanat avatar Dec 20 '19 05:12 olljanat

Since last time I looked into this, containerd gained support for Windows 10 1809/Windows Server 2019, so it's possible no MS involvement in buildkit is needed, if it can get everything it needs for the low-level part via its containerd backend.

Edit: A quick look at the build system for buildkit suggests that you need running buildkit (either locally, or running inside Docker) to build buildkit. I'm somewhat flummoxed by this.

TBBle avatar Dec 30 '19 00:12 TBBle

@TBBle hmm. Yea here is some info about containerd support on https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/containerd so maybe it can be possible.

Then someone probably can try build buildkitd.exe for Windows to see where it fails. I also guess that latest Docker binaries with containerd support are needed ( more info about that https://github.com/moby/moby/pull/38541 )

olljanat avatar Dec 30 '19 06:12 olljanat

Ah, thank you. moby/moby#38541 is the PR reference I was looking for earlier.

Poking through, containerd doesn't seem to publish Windows binaries in their releases despite having thew new Windows V2 runtime in their 1.3.0 release, and their AppVeyor build pipeline doesn't capture artifacts.

The required hcsshim project does publish artifacts from their AppVeyor pipeline, even though they don't include them in their releases.

Both have recent-enough releases to meet the criteria laid out in moby/moby#38541 but they both also have active work on master which might make a difference.

containerd currently vendors a specific commit of hcsshim (Microsoft/hcsshim@d2849cbdb9dfe5f513292a9610ca2eb734cdd1e7), binaries for which can be fetched from AppVeyor. For containerd 1.3.2 (Microsoft/hcsshim@9e921883ac929bbe515b39793ece99ce3a9d7706) the binaries are also on AppVeyor but will expire in late February. Both of these vendored versions are older than the current hcsshim release, 0.8.7, whose artifacts are also on AppVeyor.

In the end, it's not clear to me if this ecosystem is yet in a state to start trying to get BuildKit working, and containerd/containerd#1920 (which has not been updated since the switch to the Windows V2 API) gives me a reasonable level of doubt.

TBBle avatar Jan 03 '20 05:01 TBBle

Quick correction: Containerd does have nightly builds for Windows, they're at https://github.com/containerd/containerd/actions?query=workflow%3ANightly

TBBle avatar Jan 04 '20 10:01 TBBle

So with a bit of hacking I got containerd working on my Windows 10 Desktop (mostly blocked by a bug recently introduced into containerd master Edit: Fix pending in containerd/containerd#3929).

I then did a bunch more hacking on BuildKit, including fixing a couple of bugs, and commenting out a lot of stuff.

Buildkitd ran, and tried to build me a package, but failed because it didn't copy the Dockerfile over.

PS C:\Users\paulh\Documents\BuildKit\simpleDocker> buildctl.exe --debug build --frontend=dockerfile.v0 --local context=. --local dockerfile=.
[+] Building 0.0s (0/0)
time="2020-01-05T07:47:33+11:00" level=debug msg="serving grpc connection"
[+] Building 0.1s (2/2) FINISHED
 => [internal] load build definition from Dockerfile                                                                     0.1s
 =>
 => transferring dockerfile: 983B                                                                                     0.0s
 => [internal] load .dockerignore                                                                                        0.1s
 =>
 => transferring context: 2B                                                                                          0.0s
error: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to read dockerfile: open C:\Users\paulh\AppData\Local\Temp\buildkit-mount017874163\Dockerfile: The system cannot find the file specified.
failed to solve
github.com/moby/buildkit/client.(*Client).solve.func2
        C:/Users/paulh/go/src/github.com/moby/buildkit/client/solve.go:203
github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
        C:/Users/paulh/go/src/github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup/errgroup.go:57
runtime.goexit
        c:/go/src/runtime/asm_amd64.s:1357

I assume this is because I commented out too much, and somehow excluded the code that actually copies things into the snapshots, as both created snapshots were empty despite reporting having transferred stuff. The DockerFile itself did no transfers from the host OS, it's [MS's trivial Python example](# https://github.com/MicrosoftDocs/Virtualization-Documentation/blob/master/windows-container-samples/python/Dockerfile).

PS C:\Users\paulh\Documents\BuildKit\simpleDocker> buildctl.exe --debug du
ID                                                                      RECLAIMABLE     SIZE    LAST ACCESSED
x86vuhy70whikjae56p5wsfmo*                                              true            0B
m733jropkh4azwwgoknhowicq*                                              true            0B
Reclaimable:    0B
Total:          0B
PS C:\Users\paulh\Documents\BuildKit\simpleDocker> buildctl.exe --debug prune
ID                                                                      RECLAIMABLE     SIZE    LAST ACCESSED
m733jropkh4azwwgoknhowicq*                                              true            0B
x86vuhy70whikjae56p5wsfmo*                                              true            0B
Total:  0B

TBBle avatar Jan 04 '20 21:01 TBBle

With #1314, and some more hacking on things, I've gotten to the point where my next failure is coming from inside containerd, or the connection to it.

PS C:\Users\paulh\Documents\BuildKit\supersimpleDocker> buildctl --debug build --frontend=dockerfile.v0 --local context=. --local dockerfile=.
time="2020-01-06T08:03:16+11:00" level=debug msg="serving grpc connection"
[+] Building 4.7s (4/5)
[+] Building 4.7s (5/5) FINISHED
 => [internal] load build definition from Dockerfile                                                                     0.0s  => => transferring dockerfile: 588B                                                                                     0.0s  => [internal] load .dockerignore                                                                                        0.0s  => => transferring context: 2B                                                                                          0.0s  => [internal] load metadata for mcr.microsoft.com/windows/servercore:1909                                               0.2s  => CACHED [1/2] FROM mcr.microsoft.com/windows/servercore:1909@sha256:12327ccba5d74921479cc95b56e9422278ac3565740c2a46  0.0s  => => resolve mcr.microsoft.com/windows/servercore:1909@sha256:12327ccba5d74921479cc95b56e9422278ac3565740c2a46359bf0a  0.0s  => ERROR [2/2] RUN echo Write-Host -ForegroundColor Red Hello > wr.ps1                                                  4.4s ------
 > [2/2] RUN echo Write-Host -ForegroundColor Red Hello > wr.ps1:
------
error: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to build LLB: executor failed running [powershell -command echo Write-Host -ForegroundColor Red Hello > wr.ps1]: failure waiting for process: rpc error: code = Unknown desc = ttrpc: closed: unknown
failed to solve
github.com/moby/buildkit/client.(*Client).solve.func2
        C:/Users/paulh/go/src/github.com/moby/buildkit/client/solve.go:203
github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
        C:/Users/paulh/go/src/github.com/moby/buildkit/vendor/golang.org/x/sync/errgroup/errgroup.go:57
runtime.goexit
        c:/go/src/runtime/asm_amd64.s:1357

I've pushed one commit that needs more work (breaks the auto tests) plus my hacks onto https://github.com/TBBle/buildkit/tree/hacks_ahoy, in case anyone else wants to play with this.

For reference, I was working with source from containerd/containerd#3929, to fix a blocking bug and Microsoft/hcsshim#749, to let me build without gcc. For hcshim, had I not been instrumenting the source, I could have used the nightly binary build of the containerd shim, and I'm planning to suggest/submit that their releases include pushing a container for the container managed /opt feature, which would avoid hunting down binaries and adding them to the $PATH. (Edit: Microsoft/hcsshim#750)

TBBle avatar Jan 05 '20 21:01 TBBle

The failure I hit in my previous run turned out to be a bug in hcsshim, for which I have posted a fix at microsoft/hcsshim#752.

So now I am able to build a trivial Dockerfile. So trivial it's pointless, except that it worked.

FROM mcr.microsoft.com/windows/servercore:1909
LABEL Description="Built with BuildKit!"
SHELL ["powershell", "-command"]
RUN echo Write-Host -ForegroundColor Red Hello > wr.ps1
CMD ["powershell" ".\wr1.ps1"]

I don't know yet if my containers do not have networking set up properly due to my Buildkit spec-generation hacks, or some other aspect of my setup unrelated to Buildkit.

As well as networking issues, filesystem commands do not function on Windows due to an assertion about idmapping support.

I was worried about API issues, so I had vendored containerd master into buildkit, and hcsshim master into containerd. However, I suspect that this wasn't necessary, and I'll back those out next time I look at this.

I've rebased https://github.com/TBBle/buildkit/tree/hacks_ahoy to the current version of #1314, so it should be relatively easy for anyone who wants to try this out, and perhaps try and turn some of my hacks into further valuable commits.

TBBle avatar Jan 07 '20 19:01 TBBle

@TBBle cool to see someone tackling this. Does your fork handles the alternative <pathOfDockerfile>.dockerignore path for .dockerignore files? That is pretty much the only thing I miss for the moment.

guillaume86 avatar Apr 05 '20 12:04 guillaume86

It probably doesn't, but only because all the file-copy APIs in BuildKit fail an assertion on Windows related to permissions support.

I really should get back to this, it got jammed up behind questions about containerd 1.2 support, and then other stuff came up.

TBBle avatar Apr 05 '20 14:04 TBBle

There is an issue logged on Microsoft Windows Containers repo https://github.com/microsoft/Windows-Containers/issues/34

jorgearteiro avatar Jul 07 '20 15:07 jorgearteiro

Now I'm looking at this again, I realise I previously only tested building into the buildkit cache.

Outputting also does work:

  • image, oci, and docker outputs all filed calculating diff pairs due to something not being implemented in containerd. Not sure if this is actually a missing feature, or we just need to use a different containerd API on Windows, like in the mounting. Edit: Looks like a containerd missing feature: https://github.com/containerd/containerd/issues/4394
  • tar and local outputs just capture the sandbox.vhdx for the top layer (an internal detail of the HCS) rather than the contents of the image, as one would expect. Probably related to assumptions around the mount behaviour, which I'm already working around in the container-mounting support.

TBBle avatar Jul 16 '20 06:07 TBBle

I got image, oci, and docker outputs working in containerd in https://github.com/containerd/containerd/pull/4399, so I can now run the (trivial) images I build. So then back to working out how to do non-trivial things in the build script, next week. With a bit of luck I'm now free of any further containerd issues or unimplemented features.

FROM mcr.microsoft.com/windows/servercore:2004
LABEL Description="Built with BuildKit!"
SHELL ["powershell", "-command"]
ENTRYPOINT ["powershell"]
RUN echo "Write-Host -ForegroundColor DarkGreen Hello World" > C:/wr.ps1
CMD ["-command", "C:/wr.ps1"]
buildctl build --frontend dockerfile.v0 --local context=. --local dockerfile=. --output type=image,name=supersimpledocker,oci-mediatypes=true
ctr --namespace buildkit run --rm --tty supersimpledocker tm1

TBBle avatar Jul 17 '20 18:07 TBBle

Small progress report. I now have networking functional for the containerd worker under Windows. It's a minor hassle to set up using BuildKit and containerd directly (as you have to source and configure a CNI plugin yourself, and the Windows CNI landscape is... rough), but Docker provides its own managed network stack to use with BuildKit, so once someone implements the Docker side of the Buildkit integration, it won't be any more hassle than networking under any other setup.

No containerd changes this time, as containerd happily uses whatever CNI setup you pass it.

I now have the below functioning, see #1585 for details.

FROM mcr.microsoft.com/windows/servercore:2004

LABEL Description="Python" Vendor="Python Software Foundation" Version="3.7.3"

RUN powershell.exe -Command \
    $ErrorActionPreference = 'Stop'; \
    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; \
    wget https://www.python.org/ftp/python/3.7.3/python-3.7.3.exe -OutFile c:\python-3.7.3.exe ; \
    Start-Process c:\python-3.7.3.exe -ArgumentList '/quiet InstallAllUsers=1 PrependPath=1' -Wait ; \
    Remove-Item c:\python-3.7.3.exe -Force

TBBle avatar Jul 20 '20 15:07 TBBle

It occurs to me... I'm only testing with the containerd backend. Is there any interest in the runc executor working (using runhcs)? I feel like there's a movement away from using runhcs, and I'm not totally sure that this would avoid the use of containerd anyway, as things like the layer differ go through it; I haven't looked at what the runc executor does in this case.

TBBle avatar Jul 21 '20 05:07 TBBle

@TBBle Ideally both would work like in Linux but one is not a requirement for the other. It seems to me that worker that doesn't depend on containerd would be even simpler to get working. We should still reuse as much containerd code as possible and avoid duplication. For the differ, this is what Linux side does as well - it still uses the containerd differ, just it uses the library directly that is vendored into buildkit instead of the grpc API to containerd daemon.

tonistiigi avatar Jul 21 '20 05:07 tonistiigi

@TBBle we should also probably prioritize getting some CI running. It is quite hard for all of the current maintainers to actually test any of these changes. It is fine if the current test suite almost doesn't pass. We can start with some basics like the example you had above. I'm not quite sure how well the CI workers support wcow. Eventually, we probably want to switch from travis to github actions but we have some build-cache logic that can't be very easily transferred so it will take time. If Github actions support what is needed for this we could initially do something special there for windows only.

tonistiigi avatar Jul 21 '20 06:07 tonistiigi

The main blocker (my last remaining hack) for bringing this up in CI is refactoring GenerateSpec to not add any Linux elements to the spec, as that triggers LCOW mode.

That's my next task anyway, since that's the last change in my "hacks_ahoy" branch. Once that's in-place, I plan to start trying out the various tests on CI and see which pass. There's still an unmeasured pile of work to make the in-build filesystem support work (I know it currently fails due to rejecting attempts to set permissions), but hopefully I can identify a subset of the tests that can pass.

TBBle avatar Jul 21 '20 06:07 TBBle

A problem for using the vendored containerd for client-side diffing in the runc executor is that the vendored containerd is 1.3, which doesn't support diffing windows-layers, as that code is only in a PR I have open against containerd master, and I'm hoping it'll land in time for containerd 1.4 to be branched, although the beta series has already started and I don't know how much risk containerd will wear between betas.

I see BuildKit has a filesystem-only differ for windows-layers used on non-Windows platforms; I'm not sure whether it is a viable alternative to the hcs-based tar streaming used on Windows in the meantime, as I haven't looked closely at what differences it might have, c.f. https://github.com/containerd/containerd/pull/4399#issuecomment-660283335

TBBle avatar Jul 21 '20 07:07 TBBle

@TBBle The vendored containerd does not need to be stable release. We mostly vendor master to get the latest fixes. For the differ, I doubt the current windows-layers thing is usable. It is just for handling the different tar format(windows has a parent Hives/Files directories). Opened an issue to support it natively in https://github.com/containerd/containerd/issues/2469 as well so we don't need a hack. It would be nice if we could do the opposite as well(build Linux layers in windows) but that is not a priority atm of course.

tonistiigi avatar Jul 21 '20 07:07 tonistiigi