buildx
buildx copied to clipboard
monitor: Enable to run build and invoke in background
#1104
This commit enables to launch builds in a background process and adds some related commands. Please give me feedbacks on the design of the detaching and the CLI.
This covers PR4 and PR5 of #1104.
PR4 Refactor build command to invoke builds in a background process. This is important as we don't want the lifecycle of "debugging session" to be locked into a single process. A socket should be created under ~/.buildx and even if the current process (unexpectedly) dies its state can be accessed again via the socket.
PR5 Add list and attach commands to the monitor mode. List would show all the current active sessions (via the socket described in the previous section). Attach would make that session active in current process. If the session is already active in another process these processes would detach and go to the monitor mode.
Example:
BUILDX_EXPERIMENTAL=1
is required to enable detached buildx.
FROM ghcr.io/stargz-containers/busybox:1.32.0-org
RUN echo hello > /hello
buildx build
$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx build --invoke sh /tmp/ctx
[+] Building 3.8s (6/6) FINISHED
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 111B 0.0s
=> [internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for ghcr.io/stargz-containers/busybox:1.32.0-org 2.6s
=> [auth] stargz-containers/busybox:pull token for ghcr.io 0.0s
=> [1/2] FROM ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678 0.9s
=> => resolve ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678 0.0s
=> => sha256:ea97eb0eb3ec0bbe00d90be500431050af63c31dc0706c2e8fb53e16cff7761f 764.63kB / 764.63kB 0.8s
=> => extracting sha256:ea97eb0eb3ec0bbe00d90be500431050af63c31dc0706c2e8fb53e16cff7761f 0.1s
=> [2/2] RUN echo hello > /hello 0.1s
Launching interactive container. Press Ctrl-a-c to switch to monitor console
/ #
The above command launches two buildx processes: client and server. The server is detached in a separated session. The detached buildx server actually runs the build and the foreground client buildx forwards I/O and provides monitor prompt.
ps command on the host:
$ ps -C buildx -o pid,ppid,pgid,sess,tty,comm
PID PPID PGID SESS TT COMMAND
2175912 2172353 2175912 2172353 pts/7 buildx
2175934 3904 2175923 2175923 ? buildx
The detached server lives even after the client exits.
$ ps -C buildx -o pid,ppid,pgid,sess,tty,comm
PID PPID PGID SESS TT COMMAND
2175934 3904 2175923 2175923 ? buildx
list
list
lists available server instances.
USING
will be ture
if the current monitor connects to that instance.
(buildx) list
ID PID STARTED USING
buildx3846920277 2176183 40 seconds ago true
buildx902825017 2175934 3 minutes ago false
attach
attach
attaches to the specified instance.
Monitor commands like reload
and rollback
can be issued to the newly connected instance.
(buildx) attach buildx902825017
Interactive container was restarted. Press Ctrl-a-c to switch to the new container
(buildx) list
ID PID STARTED USING
buildx3846920277 2176183 About a minute ago false
buildx902825017 2175934 3 minutes ago true
(buildx) reload
[+] Building 0.4s (5/5) FINISHED
=> [internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 111B 0.0s
=> [internal] load metadata for ghcr.io/stargz-containers/busybox:1.32.0-org 0.3s
=> [1/2] FROM ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678 0.0s
=> => resolve ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678 0.0s
=> CACHED [2/2] RUN echo hello > /hello 0.0s
Interactive container was restarted. Press Ctrl-a-c to switch to the new container
(buildx) Switched IO
/ # cat hello
hello
kill
kill
kills the specified instance.
If no argument is provided, it kills the connecting instance.
(buildx) list
ID PID STARTED USING
buildx3846920277 2176183 About a minute ago false
buildx902825017 2175934 4 minutes ago true
(buildx) kill buildx3846920277
(buildx) list
ID PID STARTED USING
buildx902825017 2175934 4 minutes ago true
(buildx) kill
(buildx) list
ID PID STARTED USING
How it works
buildx build
forks and detaches itself as a server daemon. After launching the detached server, the foreground one continues running as a client.
Each buildx server instance provides gRPC API via socket (e.g. ~/.local/share/buildx/<instance-name>/buildx.sock
) for serving build and invoke functionalities. buildx client provides monitor prompt and forwards I/O. It issues builds and invokes to the server everytime the user requests on the monitor.
I/O for the build (e.g. Dockerfile via stdin) and container (e.g. stdio) are forwarded via gRPC API similarly done by BuildKit's NewContainer
gateway API.
Please note that the server doesn't support multiple clients as of now. Every time the user runs buildx build
, it launches a new server process with a newly created socket. If a client requests a build to the server while another client's build is ongoing, the old one is cancelled and the new one starts. In the future, the server can support parallel multiple builds and invokes. Maybe we need some additional refactorings for the buildx build
implementation for supporting parallel calls.
TODOs
-
Considerations
-
attach
command internally runsrollback
command for detaching the old client and attaching to the new client. Should we allow this without restarting the container?
-
-
TODOs
- Discuss and determine the design
- Discuss about the design of CI
- The following files have been copied from BuildKit for generating protobuf-related code.
- hack/dockerfiles/generated-files.Dockerfile
- hack/update-generated-files
- hack/validate-generated-files
- The following files have been copied from BuildKit for generating protobuf-related code.
- Add tests
cc @tonistiigi @jedevc
I did think that the server process would be shared for all builds. I wonder which one would be preferred. Instance per build uses more memory but might be simpler and maybe also better for upgrades.
Yes, I agree with this. Both sound good to me though I'm currently taking the approch of server per build because it was easier to implement. How should the shared buildx server be managed? Should we let the user manage & upgrade buildx server by their own (using something like systemd)?
FROM ghcr.io/stargz-containers/busybox:1.32.0-org
RUN echo hello > /hello
$ BUILDX_EXPERIMENTAL=1 docker buildx build --invoke sh .
#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s
#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 111B done
#2 DONE 0.0s
#3 [internal] load metadata for ghcr.io/stargz-containers/busybox:1.32.0-org
#3 DONE 0.2s
#4 [1/2] FROM ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678
#4 resolve ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678 0.0s done
#4 DONE 0.0s
#5 [2/2] RUN echo hello > /hello
#5 CACHED
Launching interactive container. Press Ctrl-a-c to switch to monitor console
/ #
Ctrl-a-c
doesn't not seem to switch to monitor console:
Launching interactive container. Press Ctrl-a-c to switch to monitor console
/ # ^C
But if I exit
, it switches though:
Launching interactive container. Press Ctrl-a-c to switch to monitor console
/ # exit
Switched IO
(buildx)
$ uname -a
Linux pc-sff 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
@ktock Let me know if you need more info
@crazy-max Thank you for trying this.
Let me know if you need more info
- Is this reproduced on the
master
branch version's buildx as well? - And please make sure that the toggle keys are not
Ctrl-a
+Ctrl-c
butCtrl-a
+c
. - Could you provide
docker buildx inspect
?
Couple notes while experimenting :tada: (awesome work btw)
Ctrl-a-c doesn't not seem to switch to monitor console
Also having this.
-
Is this reproduced on the master branch version's buildx as well? No, works fine on
master
. -
And please make sure that the toggle keys are not Ctrl-a + Ctrl-c but Ctrl-a + c. Confirmed, yup.
-
Could you provide docker buildx inspect ? (from a docker-desktop setup, but can also repro with a native docker install)
$ buildx inspect container Name: container Driver: docker-container Nodes: Name: container0 Endpoint: desktop-linux Status: running Buildkit: v0.10.4 Platforms: linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
A socket should be created under ~/.buildx and even if the current process (unexpectedly) dies its state can be accessed again via the socket.
We don't seem to do this - the server process seems to remain active indefinitely until kill-ed. I think we should probably aim to have the client send a disconnect
message if it exits cleanly (separate from the kill
message, since when we have multiple clients to a single server, they should all disconnect
before dying).
Please note that the server doesn't support multiple clients as of now. Every time the user runs buildx build, it launches a new server process with a newly created socket
The error messages aren't super clear for this atm. If we disconnect a client, ideally the server should send some reason
to print out.
I did think that the server process would be shared for all builds.
I'd prefer something like this as well, though the split approach would keep crashes isolated to one build - a broken server taking out all the debugging sessions in process would definitely be a frustrating experience.
If we do go shared, I think we should go minimal to start, and simply manage the server by starting it if it's not already running, otherwise, connecting to the existing instance. If we detect a version mismatch, we should restart the server automatically after asking the user to disconnect all their debugging sessions. We could look into managing it with systemd/etc as an advanced use case later, but I think out-of-the-box experience should be low-friction (e.g. buildx can run inside a container, where systemd might not be available).
attach command internally runs rollback command for detaching the old client and attaching to the new client. Should we allow this without restarting the container?
I think this depends - if we want to allow multiple debugging clients for a single server, and those should both be able to open shells, then they need to be separate shells (or stdin/stdout/stderr will get messy), so we'd always restart the container. If we're ok with only a single shell at the same time, I think it would be nice to switch over without restarting.
Thank you for the review. Updated the patch based on the comments.
Ctrl-a-c doesn't not seem to switch to monitor console:
Fixed this.
I did think that the server process would be shared for all builds.
Fixed to share one server among builds.
disconnect
Added.
detect a version mismatch
Fixed to print a warning on version mismatch.
FROM ghcr.io/stargz-containers/busybox:1.32.0-org
RUN echo hello > /hello
buildx build
- One shared buildx server
- Launched automatically when no buildx server is found.
$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx build --invoke sh /tmp/ctx
INFO: no buildx server found; launching...
[+] Building 0.4s (5/5) FINISHED
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 111B 0.0s
=> [internal] load metadata for ghcr.io/stargz-containers/busybox:1.32.0-org 0.4s
=> [1/2] FROM ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4 0.0s
=> => resolve ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4 0.0s
=> CACHED [2/2] RUN echo hello > /hello 0.0s
Launching interactive container. Press Ctrl-a-c to switch to monitor console
/ #
list
(buildx) list
ID CURRENT_SESSION
k0y9wsvamlh6rt7rzhsb3g3vx false
xtwwz5l2tonpbqodc27uf3ztz true
attach
(buildx) list
ID CURRENT_SESSION
k0y9wsvamlh6rt7rzhsb3g3vx false
xtwwz5l2tonpbqodc27uf3ztz true
(buildx) attach k0y9wsvamlh6rt7rzhsb3g3vx
(buildx) list
ID CURRENT_SESSION
k0y9wsvamlh6rt7rzhsb3g3vx true
xtwwz5l2tonpbqodc27uf3ztz false
disconnect
Ctrl-D disconnects the session.
(buildx) [Ctrl-D]
From another terminal, we can see that npvjmblid3xyill4mltoum00k
is disconnected:
(buildx) list
ID CURRENT_SESSION
s1ainu0yjs3pgt9oplurlaqpc true
npvjmblid3xyill4mltoum00k false
(buildx) list
ID CURRENT_SESSION
s1ainu0yjs3pgt9oplurlaqpc true
disconnect
command is also provided for explicit disconnecting.
(buildx) list
ID CURRENT_SESSION
k0y9wsvamlh6rt7rzhsb3g3vx true
xtwwz5l2tonpbqodc27uf3ztz false
(buildx) disconnect
(buildx) list
ID CURRENT_SESSION
xtwwz5l2tonpbqodc27uf3ztz false
version mismatch
version mismatch
warning is printed when the versions of client and server aren't the same.
$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx build --invoke sh /tmp/ctx
WARNING: version mismatch (server: "github.com/docker/buildx 0.0.0+unknown ", client: "github.com/docker/buildx 0.0.0+unknown test"); please kill and restart buildx server
I like the refactor to use a single server :tada: Left a couple high-level comments, got a few notes I found while playing:
-
list
,reload
, etc feel clunky to me. I think we should go with commands that feel more "familiar" - maybereload
->build
andlist
tostatus
.attach
anddisconnect
don't match up, we should haveattach
anddetach
orconnect
anddisconnect
.kill
kills the whole server - that was unexpected, when other clients were connected! - I keep looking around for a command that will just drop in to monitor mode directly, without running a build - I think the only way to get there atm is to start a dummy build, then disconnect.
- Connecting multiple clients to a single session is weird. I think only one is supported, but when attaching a second client, it appears to work - with both
list
s in each client saying they're connected - even though the output has already been switched away from that first client. I think we should be more explicit with disconnecting clients when someone else connects. - We can leak sessions fairly easily :smile: If I have two separate clients in two separate sessions, then connect client 1 to session 2, then session 1 is owned by no one, then closing all the clients leaves that session unowned. Not sure what we should do here - should we kill a session the moment everyone has called disconnect from it (so it should still be preserved from a client crash), or should we leave it open until the client that opened it has exited?
I'm happy if we circle back and address these later, just wanted to list them out so I don't forget!
@jedevc Thank you for the review.
We can leak sessions fairly easily smile If I have two separate clients in two separate sessions, then connect client 1 to session 2, then session 1 is owned by no one, then closing all the clients leaves that session unowned. Not sure what we should do here - should we kill a session the moment everyone has called disconnect from it (so it should still be preserved from a client crash), or should we leave it open until the client that opened it has exited?
disconnect
command can take an arg to forcibly disconnect a specific session. So we can manually kill a specific session.
list, reload, etc feel clunky to me. I think we should go with commands that feel more "familiar" - maybe reload -> build and list to status. attach and disconnect don't match up, we should have attach and detach or connect and disconnect. kill kills the whole server - that was unexpected, when other clients were connected!
I keep looking around for a command that will just drop in to monitor mode directly, without running a build - I think the only way to get there atm is to start a dummy build, then disconnect.
Connecting multiple clients to a single session is weird. I think only one is supported, but when attaching a second client, it appears to work - with both lists in each client saying they're connected - even though the output has already been switched away from that first client. I think we should be more explicit with disconnecting clients when someone else connects.
Thank you for the suggestion. Could we address them in following-up patches as this PR is already very big?
Hey @ktock sorry it's taking so long to review :tada: I'm (personally) happy with the design, and would be happy to approve soon.
I think the design is good atm, the interface feels quite natural, but there is some complexity there we should address in a follow-up (IMO). My comments are mostly about making the code clear so I'd feel comfortable working on it in the future myself :smile:
A couple notes:
- I'm having difficulty understanding the
ioset
package, especially theSingleForwarder
. Some godoc and some inline comments explaining what's going on and what it's for would be really helpful (I've spent quite a while trying to work it out, and it's not clicking for me!) -
ioset
'sIn
/Out
should probably be calledPipeReader
/PipeWriter
, since they're created by aPipe
method (ideally they'd also have some way of making sure that the caller can't accidentally callClose
on theStdin
/Stdout
/Stderr
directly). Again, some godoc would be nice :heart: - There's a few functions all with similar names and behaviors
runBuild
/runBuildOnBuilder
/runBuildContext
/buildTargets
. This is kind of tricky to track through the code - some clearer names (and godoc) would be nice here. - We should avoid calling these "builders" - that name is already used for when we create drivers, and makes some of the code tricky to follow. We've already been using
controllers
(withBuildxController
) which I quite like, but we should make sure that the whole PR follows the same standard.
Some things for follow-ups:
- Things mentioned here. Yup this PR is getting long, it'll be easier to work on these in-tree (and then we can multitask a bit easier) :tada:
- We should refactor out the different controllers (local and remote) into a separate package. I tried having a look at this myself, but as is, there's some strong coupling between the cli and the controllers. Even if that gets nicely separated out, we end up in a scenario where build and bake still use the same utilities, which makes things tricky. This ended up being kind of a time sync, and IMO, we should probably first work out how we want this design to interact with bake before going too much further. I'm happy to take a look at this once we merge, I already spent some time getting familiar with what will make this tricky.
- I wonder if we should prefer a config file in a known location like
~/.docker/buildx
- instead of requiring it be passed each time :thinking:
@jedevc Thank you for your comments.
I'm having difficulty understanding the ioset package, especially the SingleForwarder. Some godoc and some inline comments explaining what's going on and what it's for would be really helpful (I've spent quite a while trying to work it out, and it's not clicking for me!) ioset's In/Out should probably be called PipeReader/PipeWriter, since they're created by a Pipe method (ideally they'd also have some way of making sure that the caller can't accidentally call Close on the Stdin/Stdout/Stderr directly). Again, some godoc would be nice heart
Added godoc to ioset
package.
There's a few functions all with similar names and behaviors runBuild/runBuildOnBuilder/runBuildContext/buildTargets. This is kind of tricky to track through the code - some clearer names (and godoc) would be nice here.
Tried to make it clearer by renaming some of them as the following.
-
runBuildContext
->runBuildWithContext
-
runBuildOnBuilder
->launchControllerAndRunBuild
-
runBuild
->runBuild
(unchanged) -
buildTargets
->buildTargets
(unchanged)
Please feel free to apply more changes (maybe in following-up PRs) if you come up with the better names.
We should avoid calling these "builders" - that name is already used for when we create drivers, and makes some of the code tricky to follow. We've already been using controllers (with BuildxController) which I quite like, but we should make sure that the whole PR follows the same standard.
Renamed builder
to controller
.
Some things for follow-ups:
Things mentioned here. Yup this PR is getting long, it'll be easier to work on these in-tree (and then we can multitask a bit easier) tada We should refactor out the different controllers (local and remote) into a separate package. I tried having a look at this myself, but as is, there's some strong coupling between the cli and the controllers. Even if that gets nicely separated out, we end up in a scenario where build and bake still use the same utilities, which makes things tricky. This ended up being kind of a time sync, and IMO, we should probably first work out how we want this design to interact with bake before going too much further. I'm happy to take a look at this once we merge, I already spent some time getting familiar with what will make this tricky.
Let's work on it in following up PRs.