testcontainers-node icon indicating copy to clipboard operation
testcontainers-node copied to clipboard

support building docker images using BuildKit

Open helmlover opened this issue 2 years ago • 17 comments

The claim "Docker [...] Works out of the box." is no longer true. That is because docker now uses BuildKit per default, while the BuildKit Dockerfile syntax is not supported when building docker images with testcontainers-node (e.g. with GenericContainer.fromDockerfile(buildContext).build())

When building a Dockerfile containing BuildKit-features e.g.

RUN--mount=type=cache,id=maven,target=/root/.m2/repository mvn --batch-mode --no-transfer-progress dependency:resolve dependency:resolve-plugins

through testcontains-node (with export 'DEBUG=testcontainers*'), the output currently looks like:

2023-05-12T08:52:02.265Z testcontainers:build [localhost/43420fa7dc4a:2c9e53cede2c] {"errorDetail":{"message":"the --mount option requires BuildKit. Refer to https://docs.docker.com/go/buildkit/ to learn how to build images with BuildKit enabled"},"error":"the --mount option requires BuildKit. Refer to https://docs.docker.com/go/buildkit/ to learn how to build images with BuildKit enabled"}

Exporting export DOCKER_BUILDKIT=1 does not change the problem/output.

PS: Sibling issue in testcontainers-java: https://github.com/testcontainers/testcontainers-java/issues/2857

helmlover avatar May 12 '23 10:05 helmlover

Hi @helmlover, see the parent issue: https://github.com/docker/for-linux/issues/1136. BuildKit does not yet seem supported over the Docker HTTP API, as such is only currently available via the CLI.

cristianrgreco avatar Jun 05 '23 09:06 cristianrgreco

Hi, Is there a plan to add this support ?

osa0805 avatar Jan 31 '24 14:01 osa0805

+1, we need support for this....curious if there a workaround available to achieve the RUN cache like functionality without buildkit features?

praveensvsrk avatar Mar 29 '24 11:03 praveensvsrk

It looks like what's necessary is a support for creating a session and running it in dockerode. Go implementation for that in terraform was added here - https://github.com/kreuzwerker/terraform-provider-docker/pull/387/files#diff-4596d40531ae2e21f6074d104e6dc7317537946b56d95df847c9209dfbe30fceR329 The session run code is here

silh avatar Mar 30 '24 21:03 silh

Note that the API does support using BuildKit with the version option: this issue has fixed in the correct repo: https://github.com/moby/moby/blob/master/api/swagger.yaml#L8722-L8731

The linked upstream issue (https://github.com/docker/for-linux/issues/1136) is on a deprecated/seemingly abandoned repo.

mikeseese avatar Jun 11 '24 02:06 mikeseese

The main problem is with setting up a websocket connect for the session which is required for version 2. IIRC, dockerode (or its underling library docker-modem) didn't support that.

silh avatar Jun 11 '24 06:06 silh

I'm not sure about that, or the requirements for this module (as I'm just bubbling up the finding as I saw others waiting on an issue on an abandoned repo), but I am able to use dockerode's buildImage with { version: "2" } as an option and BuildKit is used.

mikeseese avatar Jun 11 '24 06:06 mikeseese

I don't think that's right. Passing { version: "2" } to dockerode's buildImage doesn't do anything. The build still fails when using some BuildKit dependent feature like --mount=type=cache.

I quickly looked into it at some point and my impression was that is really not straight forward using build version 2 on the docker api. I am not sure about any of this but my impression was: You to have to implement a gRPC server on your side, then hijack the http connection to start a session which allow the docker daemon to make calls on your end!? It's baffling api design if you ask me 😁

schummar avatar Jun 11 '24 06:06 schummar

@schummar I guess my verification was that prior to adding { version: "2" } to the build options object, I would receive the error during building my image:

the --chmod option requires BuildKit. Refer to https://docs.docker.com/go/buildkit/ to learn how to build images with BuildKit enabled

which referenced a COPY --chmod=755 ... line in my Dockerfile

and after adding it, I was able to build the image successfully 🤷 I can close my DefinitelyTyped PR if adding it doesn't support all BuildKit features

mikeseese avatar Jun 11 '24 06:06 mikeseese

@mikeseese that's still a valid option and should be added there. It exists in docker's API description - https://docs.docker.com/engine/api/v1.45/#tag/Image/operation/ImageBuild

silh avatar Jun 11 '24 07:06 silh

@schummar I guess my verification was that prior to adding { version: "2" } to the build options object, I would receive the error during building my image:

the --chmod option requires BuildKit. Refer to https://docs.docker.com/go/buildkit/ to learn how to build images with BuildKit enabled

which referenced a COPY --chmod=755 ... line in my Dockerfile

and after adding it, I was able to build the image successfully 🤷 I can close my DefinitelyTyped PR if adding it doesn't support all BuildKit features

Oh, I think you are right! When I was experimenting some weeks back I could not get this to work. But now I have tried it again and it seems it does work! Maybe something a new docker version quietly improved? Or I have just been doing it wrong all that time 🤣 That't good news, thanks!

schummar avatar Jun 11 '24 07:06 schummar

Phew! I just finished creating a quick reproduction repo for quick testing; I was about to test --mount=type=cache, but I'll hold off since you verified yourself 👍 here's the repo in case it's helpful: https://github.com/mikeseese/dockerode-buildkit

mikeseese avatar Jun 11 '24 07:06 mikeseese

Thank you @mikeseese for sharing the findings here! I don't know if I would've ever found that upstream issue 😄

cristianrgreco avatar Jun 11 '24 10:06 cristianrgreco

I was testing it locally as a test from my PR to support in testcontainers-go was passing for me too with current docker. However, when I was switching to the older docker it was failing. I enabled debug to understand why the problem happens, which lead me to this logs:

time="2024-06-11T19:17:05.199031371Z" level=debug msg=resolving host=registry-1.docker.io
time="2024-06-11T19:17:05.199064621Z" level=debug msg="do request" host=registry-1.docker.io request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=buildkit/0.0.0+unknown request.method=HEAD url="https://registry-1.docker.io/v2/library/alpine/manifests/latest"
time="2024-06-11T19:17:07.605771997Z" level=debug msg="fetch response received" host=registry-1.docker.io response.header.content-length=157 response.header.content-type=application/json response.header.date="Tue, 11 Jun 2024 19:17:09 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.docker-ratelimit-source=80.56.164.134 response.header.strict-transport-security="max-age=31536000" response.header.www-authenticate="Bearer realm=\"https://auth.docker.io/token\",service=\"registry.docker.io\",scope=\"repository:library/alpine:pull\"" response.status="401 Unauthorized" url="https://registry-1.docker.io/v2/library/alpine/manifests/latest"
time="2024-06-11T19:17:07.605922831Z" level=debug msg=Unauthorized header="Bearer realm=\"https://auth.docker.io/token\",service=\"registry.docker.io\",scope=\"repository:library/alpine:pull\"" host=registry-1.docker.io
time="2024-06-11T19:17:07.606130331Z" level=info msg="trying next host" error="no active sessions" host=registry-1.docker.io

This happens when the base image is not present locally before the build, buildx will try to download it, but won't have auth data - it'll check it even for dockerhub - and in order to get that auth data it will try to find a session (somewhere inside github.com/containerd/containerd/remotes/docker/resolver.go retryRequest -> util/resolver/authorizer.go (dockerAuthorizer.AddResponses) -> sessionauth.GetTokenAuthority -> sessionManager.Any (should return any session).

After that I returned to the new docker version, wiped all local images and got the same error as with the old version.

Then I've decided to check the repo with an example provided by @mikeseese (https://github.com/mikeseese/dockerode-buildkit) and it the example there is a code to download a base image before building. When I commented out the pull code, deleted the pulled image and tried to run it I got the same no active session problem again.

$ node index.js
Building ./Dockerfile...
ERROR: alpine: no active sessions
ERROR: Failed to build image

silh avatar Jun 12 '24 20:06 silh

Ya it seems like the { version: "2"} isn't going to get you full BuildKit support, but it can work in some limited scenarios. There's some discussion on https://github.com/apocas/dockerode/issues/601#issuecomment-2162649440 about adding the gRPC server implementation (here's where you have conflicting results @schummar; in one test scenario you likely didn't have the image pulled vs the other)

Long story short, I think it's safe to say that moby/the docker engine has support for this issue (for some minimum version of Docker), but each client will need to implement the BuildKit client/server to fully realize support, making this issue not blocked by an upstream issue.

mikeseese avatar Jun 12 '24 21:06 mikeseese

Yeah, that's what I also found in #761. And that's also what stopped all my past experiments, because due to a bug (#771) pull: 'true' is currently always sent in testcontainers. Since we are talking about huge limitations, it's up to the maintainers whether they want to include support right away (with docs discussing the limitations) or wait until dockerode supports it properly.

schummar avatar Jun 13 '24 06:06 schummar

Made some progress: https://github.com/apocas/dockerode/pull/766 But still a few things to sort out.

schummar avatar Jun 13 '24 15:06 schummar

i get the no active sessions problem when using version 2. Is there any way to solve? I'm anxious to complete my task. thx

lbhbrave avatar Oct 25 '24 04:10 lbhbrave

Hi @schummar. I just saw the dockerode PR was merged and released.

To use it, is it just a case of providing version: 2 to the build image args? We could enable that with a .withBuildkit method, or then again if buildkit is now the default perhaps we could always send the version flag?

cristianrgreco avatar Mar 22 '25 22:03 cristianrgreco

@cristianrgreco hey, awesome that there is progress.

While it is cool that we can finally use Buildkit features with testcontainers, I should point out that at least in my tests the dockerode Buildkit implementation never worked 100% reliably. I did not have much time to follow up on in recently. But even now, when I run the dockerode test on repeat, it fails like 2% of the time with error messages like "error reading server preface: http2: frame too large". I built a prototype of a docker api library some time ago, which worked 100% of the time, but still could not spot what the issue in the dockerode implementation is. At least this shows that the Docker API is not the problem here.

Anyway, my point is that the feature should maybe be marked as experimental, with a note of the possibility of failure.

schummar avatar Mar 24 '25 20:03 schummar

Thanks for the update @schummar. I've added an optional withBuildkit() method which users can opt-in to, so I'm not so concerned about the change.

I've added a couple of buildkit tests which run for every build and so far no issues.

We'll keep up to date with dockerode, so as support for buildkit improves there, so it does here. Thanks again for your help and for your contribution to dockerode!

cristianrgreco avatar Mar 24 '25 20:03 cristianrgreco