prefect icon indicating copy to clipboard operation
prefect copied to clipboard

Deploying a Flow with Docker Desktop in Europe raises BuildError: failed to export image: NotFound: content digest

Open anze3db opened this issue 1 year ago • 2 comments

Bug summary

Trying to create a deployment with a Docker Image on MacOS with Docker Desktop fails with the following error, but only if you are located in Europe:

  File "/Users/zidar/programming/app/.venv/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 399, in coroutine_wrapper
    return run_coro_as_sync(ctx_call())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zidar/programming/app/.venv/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 243, in run_coro_as_sync
    return call.result()
           ^^^^^^^^^^^^^
  File "/Users/zidar/programming/app/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 312, in result
    return self.future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zidar/programming/app/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 182, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/zidar/.asdf/installs/python/3.12.1/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/zidar/programming/app/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 383, in _run_async
    result = await coro
             ^^^^^^^^^^
  File "/Users/zidar/programming/app/.venv/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 225, in coroutine_wrapper
    return await task
           ^^^^^^^^^^
  File "/Users/zidar/programming/app/.venv/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 389, in ctx_call
    result = await async_fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zidar/programming/app/.venv/lib/python3.12/site-packages/prefect/deployments/runner.py", line 925, in deploy
    image.build()
  File "/Users/zidar/programming/app/.venv/lib/python3.12/site-packages/prefect/docker/docker_image.py", line 73, in build
    build_image(**build_kwargs)
  File "/Users/zidar/.asdf/installs/python/3.12.1/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/zidar/programming/app/.venv/lib/python3.12/site-packages/prefect/utilities/dockerutils.py", line 194, in build_image
    raise BuildError(event["error"])
prefect.utilities.dockerutils.BuildError: failed to export image: NotFound: content digest sha256:7fb66093b170bccb413f3e1c8f4b92fa440ea68fc4cddccf4c3b47e2673cfb9c: not found

If you change your location (with a VPN) to the US, the issue does not reproduce. If you use OrbStack instead of Docker Desktop, the issue also does not reproduce.

We figured this out because coworkers in US had no trouble creating the deployment, but others in the EU constantly get the error.

Example deployment code:

job.deploy(
    work_pool_name=work_pool_name,
    image=DockerImage(
        name=docker_image_name,
        platform="linux/amd64",
        dockerfile="Dockerfile",
        target=target,
    ),
)

Building the Dockerfile manually doesn't raise this error.

Version info

Version:             3.1.0
API version:         0.8.4
Python version:      3.12.6
Git commit:          a83ba39b
Built:               Thu, Oct 31, 2024 12:43 PM
OS/Arch:             darwin/arm64
Profile:             local
Server type:         server
Pydantic version:    2.9.2
Integrations:
  prefect-docker:    0.6.1

Additional context

This issue has also popped up in the Prefect Community Slack: https://prefect-community.slack.com/archives/CL09KU1K7/p1730205889746789

anze3db avatar Nov 05 '24 16:11 anze3db

Just out of curiosity, does this also occur when explicitly passing the registry name in the image?

docker.io/your_username/image:tag

teocns avatar Nov 06 '24 09:11 teocns

I haven't tried docker.io, but the issue reproduces with AWS ECR, no matter how I specify the registry name in the image.

The issue even reproduces if neither ECR nor docker.io configured and I'm building and using the image locally without pushing it to a remote repository.

anze3db avatar Nov 06 '24 11:11 anze3db

We found a workaround for the issue, but haven't pinpointed the exact culprit yet.

If one removes platform="linux/amd64",, so:

job.deploy(
    work_pool_name=work_pool_name,
    image=DockerImage(
        name=docker_image_name,
        #platform="linux/amd64",
        dockerfile="Dockerfile",
        target=target,
    ),
)

and runs the deployment, even if the deployment fails (this is only in our case due to unrelated issue with some go package), and then adds back in the platform="linux/amd64",, the deployment goes through successfully on the second attempt.

ATM I can't reproduce the issue so I can't gather more data, but the solution worked for @anze3db today when he again stumbled upon it. When I was debugging the problem, the only thing that stood out and might be related was:

$ tail -f ~/Library/Containers/com.docker.docker/Data/log/vm/dockerd.log
...
time="2025-02-07T23:41:30.272689258Z" level=warning msg="failed to determine platform specific size" digest="sha256:6365712bd66a08e836f2308a17f0fef28f3358bc0249fd6e87fdc4ee7cb000f7" error="NotFound: content digest sha256:6365712bd66a08e836f2308a17f0fef28f3358bc0249fd6e87fdc4ee7cb000f7: not found" image="docker.io/prefecthq/prefect:3.0.11-python3.12" isPseudo=false manifest="{application/vnd.docker.distribution.manifest.v2+json sha256:6365712bd66a08e836f2308a17f0fef28f3358bc0249fd6e87fdc4ee7cb000f7 3256 [] map[] [] 0x4001d2f0e0 }"
...

So maybe something with platform isn't propagated and built correctly? But that's just speculation 🤷

uskudnik avatar Feb 13 '25 01:02 uskudnik

Just an update that we've spent some more time debugging this with Docker employees and we've opened an issue about this on their Docker Desktop for Mac repo: https://github.com/docker/for-mac/issues/7607

One interesting thing that we've found was that this issue doesn't reproduce if you remove the labels parameter from the api call, so removing this line:

https://github.com/PrefectHQ/prefect/blob/99d94359bf5ea0f2f8a61e22c49fb25f8d7c7e33/src/prefect/utilities/dockerutils.py#L173

@teocns I'm not sure if removing labels from images will break anything, but it would resolve this particular issue. Ideally though, prefect should be using buildkit to build images, but that's probably more work on your side because buildkit isn't supported by docker-py.

anze3db avatar Mar 05 '25 11:03 anze3db

This appears to have been closed upstream

cicdw avatar May 13 '25 01:05 cicdw