pants icon indicating copy to clipboard operation
pants copied to clipboard

Issues with buildx

Open kuza55 opened this issue 1 year ago • 21 comments

Describe the bug I am trying to use the new buildx features from source.

I have a multi-stage build with a docker_image target that builds on top of a base target.

With the regular builder, things work fine, but with the buildx builder, I am running into this error today:

Dockerfile:2
--------------------
   1 |     ARG BASE_IMAGE=docker/base:base
   2 | >>> FROM $BASE_IMAGE
   3 |
   4 |     RUN pip install opentelemetry-distro opentelemetry-exporter-otlp
--------------------
ERROR: failed to solve: europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/base:0.1: failed to authorize: failed to fetch oauth token: unexpected status: 403 Forbidden

The error here is probably unrelated to pants, docker is failing to pull this image from a remote repo, however docker should not need to pull this image since it exists locally because it was just built by pants:

$ docker images
REPOSITORY                                                                                       TAG               IMAGE ID       CREATED          SIZE
europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/base                     0.1               f8d553f88c7e   14 hours ago     534MB

When I disable buildx, my build works fine.

Pants version From source on main

OS Ubuntu via WSL

kuza55 avatar Nov 15 '23 19:11 kuza55

Hi, thanks for reporting.

Just to confirm, is the output field for your docker_image target using the default value of {"type": "docker"}?

Also, please include the git sha of your source version, as main is a moving target ;)

kaos avatar Nov 15 '23 19:11 kaos

Git SHA is 7e15e5c3c2d3c0f0e944f697ae6a0c550249aad0

I am using the default value for type.

I am not using a multi-platform build afaik; I am using a multi-stage build where the stages are separate docker_image targets.

kuza55 avatar Nov 15 '23 19:11 kuza55

Git SHA is 7e15e5c

👍🏽

I am using the default value for type.

I am not using a multi-platform build afaik; I am using a multi-stage build where the stages are separate docker_image targets.

yea, I mis-read that for a split second.. just picked up the multi-.. part ;p

kaos avatar Nov 15 '23 20:11 kaos

How do you stitch multiple docker_image targets into a single multi-stage build? Or, I guess it's not a multi-stage build in Docker terms, perhaps? (i.e. in Docker a multi-stage build is one docker build using a single Dockerfile with multiple images defined in it.)

Do you see the europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/base:0.1 image when you run docker images ?

kaos avatar Nov 15 '23 20:11 kaos

I have 2 targets,

# docker/base/BUILD
docker_image(
    name="base",
    image_tags=["0.1"],
    registries=[
        "@gcp",
    ],
    cache_to={
        "type": "registry",
        "ref": "europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/cache:latest",
        "mode": "max"
    },
    cache_from={
        "type": "registry",
        "ref": "europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/cache:latest"
    }
)
# docker/prod/BUILD
docker_image(
    name="prod",
    image_tags=["0.1"],
    cache_to={
        "type": "registry",
        "ref": "europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/prod/cache:latest",
        "mode": "max"
    },
    cache_from={
        "type": "registry",
        "ref": "europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/prod/cache:latest"
    }
)

And then docker/prod/Dockerfile starts with

ARG BASE_IMAGE=docker/base:base
FROM $BASE_IMAGE

I am not super clear on how pants stitches these together or what the terminology is if not a multi-stage build.

And yes, I do see the relevant image when I run docker images; I included a snippet of that output in the original post.

kuza55 avatar Nov 15 '23 21:11 kuza55

OK, I'm not sure what's up with buildx here. Perhaps @riisi have more insights?

Regarding what Pants does, it simply chains multiple docker builds, something ~eq. to:

docker build -t base:0.1 src/base
docker build -t prod:0.1 --build-arg BASE_IMAGE=base:0.1 src/prod

and the "magic" here is the build arg BASE_IMAGE, so pants passes in the image name of your base image that you identified using a default value pointing to its target address.

kaos avatar Nov 15 '23 23:11 kaos

however docker should not need to pull this image since it exists locally because it was just built by pants

I would have thought when you are using the --cache-from option, the second image build will still try to fetch the remote cache from the registry, so it will need to have creds. That said, failure to retrieve cache shouldn't result in failed build.

I'm wondering if this is a more general problem with fetching/pushing to GCP - have you tried building and pushing a single image?

After you build (package) the image with Pants, can you see the image exists locally with docker images ?

What do you have in pants.toml?

riisi avatar Nov 16 '23 12:11 riisi

This is actually separate from my attempts to use caching.

Here is my toml file:

use_buildx = true
env_vars = [
  "DOCKER_CONFIG=%(homedir)s/.docker",
  "DOCKER_BUILDKIT=0",
  "HOME",
  "AWS_PROFILE=apricot",
]
tools = [
  "docker-credential-gcloud", # or docker-credential-gcloud when using artifact registry
  "dirname",
  "readlink",
  "python3",
  # These may be necessary if using Pyenv-installed Python.
  "cut",
  "sed",
  "bash",
  # This is for aws
  "docker-credential-ecr-login",
  "getent"
]
default_repository = "{directory}/{name}"

[docker.registries.gcp]
default = true
address = "europe-west4-docker.pkg.dev"
repository = "smart-shoreline-391915/espresso-docker/{directory}/{name}"

And yes, as noted in my first post, I see the image when I run docker images.

I have read that buildx has it's own cache, but I have been unable to figure out how to inspect it.

To be clear, my main concern here is not whether I can build the image or not, but what the provenance of the image is when I have my credentials configured correctly and whether it will unnecessarily read things from the network when it already has the image locally.

kuza55 avatar Nov 16 '23 15:11 kuza55

You may need to enable the containerd image store (although actually that may be needed for multiplatform builds only).

Running with the Pants option --docker-build-verbose may help troubleshoot this - this will give you the docker CLI commands that Pants is constructing as @kaos alluded to above. E.g.

18:42:36.36 [INFO] stdout: "['/usr/local/bin/docker', 'buildx', 'build', '--cache-from=type=inline', '--cache-to=type=inline', '--output=type=docker', '--pull=False']"

riisi avatar Nov 16 '23 16:11 riisi

I see pants running this command:

docker buildx build --pull=False --tag europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/prod/prod:0.1 --build-arg BASE_IMAGE=europe-west4-docker.pkg.dev/smart-shoreline-391915/espresso-docker/base/base:0.1 --file docker/prod/Dockerfile .

Which reproduces the error when I run it from the sandbox. When I remove the buildx fragment, it builds fine despite auth errors.

After a bit more googling it seems like this is "expected" for buildx:

https://github.com/moby/moby/issues/42893 https://github.com/moby/buildkit/issues/2343

Presumably this means it requires a dependency on a repository, though it does make me a bit concerned that the build depends on what is on a remote service and has potential race conditions if someone else writes to the same tag.

kuza55 avatar Nov 16 '23 16:11 kuza55

Enabling the containerd image store did not help, but also seems like a hack around the nonhermetic nature of buildx.

kuza55 avatar Nov 16 '23 17:11 kuza55

potential race conditions if someone else writes to the same tag.

Wondering if there's a solution to this by using Pants to generate a deterministic hash in the tag.

riisi avatar Nov 16 '23 17:11 riisi

I think a deterministic hash tag would be great.

Showing these tags in build output and providing a way to access them in other build commands would also be great if possible (e.g. a shell target to start a container).

I have not dug too deep into what pants already supports here with git hashes etc, but I do want to get to a workflow where multiple people can build, publish & run containers without tripping over each other and not needing to manually edit tags in a repo.

kuza55 avatar Nov 16 '23 17:11 kuza55

The deterministic hash part should already be possible. I'm not sure re. the rest. Probably worth asking / searching in Slack for these types of questions.

riisi avatar Nov 16 '23 17:11 riisi

I think you pasted the wrong link for the deterministic hashes?

It feels like it should be the standard though? Having builds be nonhermetic, even when the underlying fault is with docker feels like a footgun.

kuza55 avatar Nov 16 '23 17:11 kuza55

Fixed. Yes, I agree it would be nice to have a default (or even some recommendations) to avoid this.

riisi avatar Nov 16 '23 17:11 riisi

regarding a stable hash, see the {pants.hash} interpolation value from https://www.pantsbuild.org/docs/tagging-docker-images#string-interpolation-using-placeholder-values

kaos avatar Nov 16 '23 20:11 kaos

Based on the extensive discussion on this issue here. I think this can potentially be solved by either a) aliasing the upstream image in the build context of the downstream image to point to a local image or b) setting the value of the build arg in pants to the address of the local image directly. I'm going to set aside some time either today or tomorrow to experiment with this and prove out that this works.

ndellosa95 avatar Dec 07 '23 14:12 ndellosa95

Okay so I did some exploration here - unfortunately I was unable to get something working. Buildx drivers other than the default docker driver are totally unable to pull images from the local image store, they can only pull images from a registry.

There is a solution here though, which is to use buildx bake to package images with buildx - instead of doing these docker builds separately pants could map the docker builds into a single bake file and then call the bake command. I am going to experiment with this now and confirm it works as I anticipate.

ndellosa95 avatar Dec 07 '23 17:12 ndellosa95

Confirmed that bake works pretty nicely!

ndellosa95 avatar Dec 07 '23 18:12 ndellosa95

@ndellosa95 I've looked into the issue with multiple dependent images and was able to get it working using the Containerd Image Store.

Here's what I've used to test this locally (M2 Mac):

# BUILD
docker_image(
    name="base",
    source="Dockerfile.base",
    cache_to={"type": "local", "dest": "/tmp/docker/pants-test-cache"},
    cache_from={"type": "local", "src": "/tmp/docker/pants-test-cache"},
    build_platform=["linux/amd64" ,"linux/arm64"],
)

docker_image(
    name="final",
    source="Dockerfile.final",
    cache_to={"type": "local", "dest": "/tmp/docker/pants-test-cache"},
    cache_from={"type": "local", "src": "/tmp/docker/pants-test-cache"},
    build_platform=["linux/amd64", "linux/arm64"],
)
# Dockerfile.base
FROM python:3.8
RUN echo "base image" >> base.txt
# Dockerfile.final
ARG PARENT=:base
FROM ${PARENT}

RUN cat base.txt && \
  echo "final image" >> final.txt
# pants.toml (relevant config only)
[GLOBAL]
pants_version = "2.19.0rc3"

[docker]
use_buildx=true
# (Note that I didn't need to map any env vars)

Enable the containerd image store - either using Docker Desktop or by setting Docker Engine config via /etc/docker/daemon.json:

{
  "features": {
    "containerd-snapshotter": true
  }
}

Switch to the default "docker" build driver (e.g., do not use docker-container) - e.g., docker builder use desktop-linux (or default).

Note I ran into this issue on my machine causing the error docker: 'buildx' is not a docker command. and was able to resolve it by creating a symlink per this comment.

I was also able to test this successfully with Github Actions.

@ndellosa95 Would you be able to check if this helps with your use case?

I'm going to put in a PR to update the docs to suggest this as the recommended approach. The containerd image store is in beta but as far as I can see has been stable for a while and there are no obvious limitations I can see.

Here is a PR to update the example-docker repo.

riisi avatar Jan 05 '24 05:01 riisi

Closing this as I believe it's fixed since 2.19 with above approach - let me know otherwise.

riisi avatar Mar 07 '24 06:03 riisi