buildx
buildx copied to clipboard
[Feature Request] Remove duplicate context transfers
As an example use case you may have a single intermediary target in your Dockerfile that acts like a cache for all downstream images. You build all your packages in your cache target and then assemble into different binaries for the different services that make up your app.
Today if you have n
targets all of which are downstream from a single target, even if the upstream is the only thing that actually relies on the context, the context will be transferred n
times in parallel.
Example
- Make a large context with
dd if=/dev/zero of=largefile count=262400 bs=1024
- Make a Dockerfile with one upstream image that actually uses the context, and three downstream images.
FROM scratch AS upstream
COPY largefile /largefile
FROM upstream AS downstream0
FROM upstream AS downstream1
FROM upstream AS downstream2
- Make a bake hcl with the targets
group "default" {
targets = ["downstream0", "downstream1", "downstream2"]
}
target "downstream0" {
target = "downstream0"
tags = ["docker.io/rabrams/downstream0"]
}
target "downstream1" {
target = "downstream0"
tags = ["docker.io/rabrams/downstream1"]
}
target "downstream2" {
target = "downstream0"
tags = ["docker.io/rabrams/downstream2"]
}
- Run
docker buildx bake
and observe duplicate context transfers
$ docker buildx bake
[+] Building 1.1s (6/12)
=> [downstream1 internal] load .dockerignore 0.2s
=> => transferring context: 2B 0.1s
=> [downstream2 internal] load build context 0.9s
=> => transferring context: 327.77kB 0.9s
=> [downstream0 internal] load build context 0.9s
=> => transferring context: 360.55kB 0.9s
=> [downstream1 internal] load build context 0.9s
=> => transferring context: 229.45kB 0.9s
For large contexts and many downstream images this can be a problem because your uplink is divided between all the context transfers that are doing the same thing.
btw, my old branch that did this https://github.com/tonistiigi/buildx/compare/bake...bake-shared-session but could use some cleanup
Hi @tonistiigi, I want to fix this issue. I would appreciate any help: For example - what part of the codebase do I need to look at? Some specific questions:
- How is context transferred to the daemon?
- Where does the code for evaluation of bake hcl lives? How does it determine which contexts daemon needs?
- How is the feedback time to test the changes?
Any high level description is good enough for me to get started.
I looked at your commit, but the code seems significantly different now.
Hey @tonistiigi, I think I have some understanding of the codebase now. It looks like we need to provide a shared session to establish only one connection to the docker daemon. Please correct me if I am wrong: I would really appreciate your help here!
- In your referenced commit, we create a new shared session for every new target. Is that desirable? Shouldn't we have just one "shared session" for all the input targets?
- Does shared session map 1:1 with transferring context, i.e. every new "shared session" will lead to a new context transfer?
- Our requirement is - all the build files use the same context, i.e. "."