buildx monitor: Enable to run build and invoke in background

monitor: Enable to run build and invoke in background

Open ktock opened this issue 1 year ago • 7 comments

#1104

This commit enables to launch builds in a background process and adds some related commands. Please give me feedbacks on the design of the detaching and the CLI.

This covers PR4 and PR5 of #1104.

PR4 Refactor build command to invoke builds in a background process. This is important as we don't want the lifecycle of "debugging session" to be locked into a single process. A socket should be created under ~/.buildx and even if the current process (unexpectedly) dies its state can be accessed again via the socket.

PR5 Add list and attach commands to the monitor mode. List would show all the current active sessions (via the socket described in the previous section). Attach would make that session active in current process. If the session is already active in another process these processes would detach and go to the monitor mode.

Example:

BUILDX_EXPERIMENTAL=1 is required to enable detached buildx.

FROM ghcr.io/stargz-containers/busybox:1.32.0-org
RUN echo hello > /hello

buildx build

$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx build --invoke sh /tmp/ctx
[+] Building 3.8s (6/6) FINISHED                                                                                                             
 => [internal] load build definition from Dockerfile                                                                                    0.1s
 => => transferring dockerfile: 111B                                                                                                    0.0s
 => [internal] load .dockerignore                                                                                                       0.1s
 => => transferring context: 2B                                                                                                         0.0s
 => [internal] load metadata for ghcr.io/stargz-containers/busybox:1.32.0-org                                                           2.6s
 => [auth] stargz-containers/busybox:pull token for ghcr.io                                                                             0.0s
 => [1/2] FROM ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678     0.9s
 => => resolve ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678     0.0s
 => => sha256:ea97eb0eb3ec0bbe00d90be500431050af63c31dc0706c2e8fb53e16cff7761f 764.63kB / 764.63kB                                      0.8s
 => => extracting sha256:ea97eb0eb3ec0bbe00d90be500431050af63c31dc0706c2e8fb53e16cff7761f                                               0.1s
 => [2/2] RUN echo hello > /hello                                                                                                       0.1s
Launching interactive container. Press Ctrl-a-c to switch to monitor console
/ #

The above command launches two buildx processes: client and server. The server is detached in a separated session. The detached buildx server actually runs the build and the foreground client buildx forwards I/O and provides monitor prompt.

ps command on the host:

$ ps -C buildx -o pid,ppid,pgid,sess,tty,comm
    PID    PPID    PGID    SESS TT       COMMAND
2175912 2172353 2175912 2172353 pts/7    buildx
2175934    3904 2175923 2175923 ?        buildx

The detached server lives even after the client exits.

$ ps -C buildx -o pid,ppid,pgid,sess,tty,comm
    PID    PPID    PGID    SESS TT       COMMAND
2175934    3904 2175923 2175923 ?        buildx

list

list lists available server instances. USING will be ture if the current monitor connects to that instance.

(buildx) list
ID			PID		STARTED		USING
buildx3846920277    	2176183 	40 seconds ago	true
buildx902825017     	2175934 	3 minutes ago	false

attach

attach attaches to the specified instance. Monitor commands like reload and rollback can be issued to the newly connected instance.

(buildx) attach buildx902825017
Interactive container was restarted. Press Ctrl-a-c to switch to the new container
(buildx) list
ID			PID		STARTED			USING
buildx3846920277    	2176183 	About a minute ago	false
buildx902825017     	2175934 	3 minutes ago		true
(buildx) reload
[+] Building 0.4s (5/5) FINISHED                                                                                                             
 => [internal] load .dockerignore                                                                                                       0.1s
 => => transferring context: 2B                                                                                                         0.0s
 => [internal] load build definition from Dockerfile                                                                                    0.0s
 => => transferring dockerfile: 111B                                                                                                    0.0s
 => [internal] load metadata for ghcr.io/stargz-containers/busybox:1.32.0-org                                                           0.3s
 => [1/2] FROM ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678     0.0s
 => => resolve ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678     0.0s
 => CACHED [2/2] RUN echo hello > /hello                                                                                                0.0s

Interactive container was restarted. Press Ctrl-a-c to switch to the new container
(buildx) Switched IO

/ # cat hello
hello

kill

kill kills the specified instance. If no argument is provided, it kills the connecting instance.

(buildx) list
ID			PID		STARTED			USING
buildx3846920277    	2176183 	About a minute ago	false
buildx902825017     	2175934 	4 minutes ago		true
(buildx) kill buildx3846920277
(buildx) list
ID			PID		STARTED		USING
buildx902825017     	2175934 	4 minutes ago	true
(buildx) kill
(buildx) list
ID	PID	STARTED	USING

How it works

buildx build forks and detaches itself as a server daemon. After launching the detached server, the foreground one continues running as a client.

Each buildx server instance provides gRPC API via socket (e.g. ~/.local/share/buildx/<instance-name>/buildx.sock) for serving build and invoke functionalities. buildx client provides monitor prompt and forwards I/O. It issues builds and invokes to the server everytime the user requests on the monitor.

I/O for the build (e.g. Dockerfile via stdin) and container (e.g. stdio) are forwarded via gRPC API similarly done by BuildKit's NewContainer gateway API.

Please note that the server doesn't support multiple clients as of now. Every time the user runs buildx build, it launches a new server process with a newly created socket. If a client requests a build to the server while another client's build is ongoing, the old one is cancelled and the new one starts. In the future, the server can support parallel multiple builds and invokes. Maybe we need some additional refactorings for the buildx build implementation for supporting parallel calls.

TODOs

Considerations
- attach command internally runs rollback command for detaching the old client and attaching to the new client. Should we allow this without restarting the container?
TODOs
- Discuss and determine the design
- Discuss about the design of CI
  - The following files have been copied from BuildKit for generating protobuf-related code.
    - hack/dockerfiles/generated-files.Dockerfile
    - hack/update-generated-files
    - hack/validate-generated-files
- Add tests

Aug 29 '22 07:08 ktock

cc @tonistiigi @jedevc

Aug 29 '22 08:08 ktock

I did think that the server process would be shared for all builds. I wonder which one would be preferred. Instance per build uses more memory but might be simpler and maybe also better for upgrades.

Yes, I agree with this. Both sound good to me though I'm currently taking the approch of server per build because it was easier to implement. How should the shared buildx server be managed? Should we let the user manage & upgrade buildx server by their own (using something like systemd)?

Sep 01 '22 05:09 ktock

FROM ghcr.io/stargz-containers/busybox:1.32.0-org
RUN echo hello > /hello

$ BUILDX_EXPERIMENTAL=1 docker buildx build --invoke sh .
#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 111B done
#2 DONE 0.0s

#3 [internal] load metadata for ghcr.io/stargz-containers/busybox:1.32.0-org
#3 DONE 0.2s

#4 [1/2] FROM ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678
#4 resolve ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4301591227ce925f3c678 0.0s done
#4 DONE 0.0s

#5 [2/2] RUN echo hello > /hello
#5 CACHED
Launching interactive container. Press Ctrl-a-c to switch to monitor console
/ #

Ctrl-a-c doesn't not seem to switch to monitor console:

Launching interactive container. Press Ctrl-a-c to switch to monitor console
/ # ^C

But if I exit, it switches though:

Launching interactive container. Press Ctrl-a-c to switch to monitor console
/ # exit
Switched IO
(buildx)

$ uname -a
Linux pc-sff 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

@ktock Let me know if you need more info

Sep 19 '22 14:09 crazy-max

@crazy-max Thank you for trying this.

Let me know if you need more info

Is this reproduced on the master branch version's buildx as well?
And please make sure that the toggle keys are not Ctrl-a + Ctrl-c but Ctrl-a + c.
Could you provide docker buildx inspect ?

Sep 20 '22 00:09 ktock

Couple notes while experimenting :tada: (awesome work btw)

Ctrl-a-c doesn't not seem to switch to monitor console

Also having this.

Is this reproduced on the master branch version's buildx as well? No, works fine on master.
And please make sure that the toggle keys are not Ctrl-a + Ctrl-c but Ctrl-a + c. Confirmed, yup.

Could you provide docker buildx inspect ? (from a docker-desktop setup, but can also repro with a native docker install)

$ buildx inspect container
Name:   container
Driver: docker-container

Nodes:
Name:      container0
Endpoint:  desktop-linux
Status:    running
Buildkit:  v0.10.4
Platforms: linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

A socket should be created under ~/.buildx and even if the current process (unexpectedly) dies its state can be accessed again via the socket.

We don't seem to do this - the server process seems to remain active indefinitely until kill-ed. I think we should probably aim to have the client send a disconnect message if it exits cleanly (separate from the kill message, since when we have multiple clients to a single server, they should all disconnect before dying).

Please note that the server doesn't support multiple clients as of now. Every time the user runs buildx build, it launches a new server process with a newly created socket

The error messages aren't super clear for this atm. If we disconnect a client, ideally the server should send some reason to print out.

I did think that the server process would be shared for all builds.

I'd prefer something like this as well, though the split approach would keep crashes isolated to one build - a broken server taking out all the debugging sessions in process would definitely be a frustrating experience.

If we do go shared, I think we should go minimal to start, and simply manage the server by starting it if it's not already running, otherwise, connecting to the existing instance. If we detect a version mismatch, we should restart the server automatically after asking the user to disconnect all their debugging sessions. We could look into managing it with systemd/etc as an advanced use case later, but I think out-of-the-box experience should be low-friction (e.g. buildx can run inside a container, where systemd might not be available).

attach command internally runs rollback command for detaching the old client and attaching to the new client. Should we allow this without restarting the container?

I think this depends - if we want to allow multiple debugging clients for a single server, and those should both be able to open shells, then they need to be separate shells (or stdin/stdout/stderr will get messy), so we'd always restart the container. If we're ok with only a single shell at the same time, I think it would be nice to switch over without restarting.

Sep 20 '22 11:09 jedevc

Thank you for the review. Updated the patch based on the comments.

Ctrl-a-c doesn't not seem to switch to monitor console:

Fixed this.

I did think that the server process would be shared for all builds.

Fixed to share one server among builds.

disconnect

Added.

detect a version mismatch

Fixed to print a warning on version mismatch.

FROM ghcr.io/stargz-containers/busybox:1.32.0-org
RUN echo hello > /hello

buildx build

One shared buildx server
Launched automatically when no buildx server is found.

$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx build --invoke sh /tmp/ctx
INFO: no buildx server found; launching...
[+] Building 0.4s (5/5) FINISHED                                                                                      
 => [internal] load .dockerignore                                                                                0.0s
 => => transferring context: 2B                                                                                  0.0s
 => [internal] load build definition from Dockerfile                                                             0.0s
 => => transferring dockerfile: 111B                                                                             0.0s
 => [internal] load metadata for ghcr.io/stargz-containers/busybox:1.32.0-org                                    0.4s
 => [1/2] FROM ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4  0.0s
 => => resolve ghcr.io/stargz-containers/busybox:1.32.0-org@sha256:bde48e1751173b709090c2539fdf12d6ba64e88ec7a4  0.0s
 => CACHED [2/2] RUN echo hello > /hello                                                                         0.0s
Launching interactive container. Press Ctrl-a-c to switch to monitor console
/ #

list

(buildx) list
ID				CURRENT_SESSION
k0y9wsvamlh6rt7rzhsb3g3vx	false
xtwwz5l2tonpbqodc27uf3ztz	true

attach

(buildx) list
ID				CURRENT_SESSION
k0y9wsvamlh6rt7rzhsb3g3vx	false
xtwwz5l2tonpbqodc27uf3ztz	true
(buildx) attach k0y9wsvamlh6rt7rzhsb3g3vx
(buildx) list
ID				CURRENT_SESSION
k0y9wsvamlh6rt7rzhsb3g3vx	true
xtwwz5l2tonpbqodc27uf3ztz	false

disconnect

Ctrl-D disconnects the session.

(buildx) [Ctrl-D]

From another terminal, we can see that npvjmblid3xyill4mltoum00k is disconnected:

(buildx) list
ID				CURRENT_SESSION
s1ainu0yjs3pgt9oplurlaqpc	true
npvjmblid3xyill4mltoum00k	false
(buildx) list
ID				CURRENT_SESSION
s1ainu0yjs3pgt9oplurlaqpc	true

disconnect command is also provided for explicit disconnecting.

(buildx) list
ID				CURRENT_SESSION
k0y9wsvamlh6rt7rzhsb3g3vx	true
xtwwz5l2tonpbqodc27uf3ztz	false
(buildx) disconnect 
(buildx) list
ID				CURRENT_SESSION
xtwwz5l2tonpbqodc27uf3ztz	false

version mismatch

version mismatch warning is printed when the versions of client and server aren't the same.

$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx build --invoke sh /tmp/ctx
WARNING: version mismatch (server: "github.com/docker/buildx 0.0.0+unknown ", client: "github.com/docker/buildx 0.0.0+unknown test"); please kill and restart buildx server

Oct 12 '22 00:10 ktock

I like the refactor to use a single server :tada: Left a couple high-level comments, got a few notes I found while playing:

list, reload, etc feel clunky to me. I think we should go with commands that feel more "familiar" - maybe reload -> build and list to status. attach and disconnect don't match up, we should have attach and detach or connect and disconnect. kill kills the whole server - that was unexpected, when other clients were connected!
I keep looking around for a command that will just drop in to monitor mode directly, without running a build - I think the only way to get there atm is to start a dummy build, then disconnect.
Connecting multiple clients to a single session is weird. I think only one is supported, but when attaching a second client, it appears to work - with both lists in each client saying they're connected - even though the output has already been switched away from that first client. I think we should be more explicit with disconnecting clients when someone else connects.
We can leak sessions fairly easily :smile: If I have two separate clients in two separate sessions, then connect client 1 to session 2, then session 1 is owned by no one, then closing all the clients leaves that session unowned. Not sure what we should do here - should we kill a session the moment everyone has called disconnect from it (so it should still be preserved from a client crash), or should we leave it open until the client that opened it has exited?

I'm happy if we circle back and address these later, just wanted to list them out so I don't forget!

Oct 12 '22 10:10 jedevc

@jedevc Thank you for the review.

We can leak sessions fairly easily smile If I have two separate clients in two separate sessions, then connect client 1 to session 2, then session 1 is owned by no one, then closing all the clients leaves that session unowned. Not sure what we should do here - should we kill a session the moment everyone has called disconnect from it (so it should still be preserved from a client crash), or should we leave it open until the client that opened it has exited?

disconnect command can take an arg to forcibly disconnect a specific session. So we can manually kill a specific session.

list, reload, etc feel clunky to me. I think we should go with commands that feel more "familiar" - maybe reload -> build and list to status. attach and disconnect don't match up, we should have attach and detach or connect and disconnect. kill kills the whole server - that was unexpected, when other clients were connected!

I keep looking around for a command that will just drop in to monitor mode directly, without running a build - I think the only way to get there atm is to start a dummy build, then disconnect.

Connecting multiple clients to a single session is weird. I think only one is supported, but when attaching a second client, it appears to work - with both lists in each client saying they're connected - even though the output has already been switched away from that first client. I think we should be more explicit with disconnecting clients when someone else connects.

Thank you for the suggestion. Could we address them in following-up patches as this PR is already very big?

Oct 21 '22 12:10 ktock

Hey @ktock sorry it's taking so long to review :tada: I'm (personally) happy with the design, and would be happy to approve soon.

I think the design is good atm, the interface feels quite natural, but there is some complexity there we should address in a follow-up (IMO). My comments are mostly about making the code clear so I'd feel comfortable working on it in the future myself :smile:

A couple notes:

I'm having difficulty understanding the ioset package, especially the SingleForwarder. Some godoc and some inline comments explaining what's going on and what it's for would be really helpful (I've spent quite a while trying to work it out, and it's not clicking for me!)
ioset's In/Out should probably be called PipeReader/PipeWriter, since they're created by a Pipe method (ideally they'd also have some way of making sure that the caller can't accidentally call Close on the Stdin/Stdout/Stderr directly). Again, some godoc would be nice :heart:
There's a few functions all with similar names and behaviors runBuild/runBuildOnBuilder/runBuildContext/buildTargets. This is kind of tricky to track through the code - some clearer names (and godoc) would be nice here.
We should avoid calling these "builders" - that name is already used for when we create drivers, and makes some of the code tricky to follow. We've already been using controllers (with BuildxController) which I quite like, but we should make sure that the whole PR follows the same standard.

Some things for follow-ups:

Things mentioned here. Yup this PR is getting long, it'll be easier to work on these in-tree (and then we can multitask a bit easier) :tada:
We should refactor out the different controllers (local and remote) into a separate package. I tried having a look at this myself, but as is, there's some strong coupling between the cli and the controllers. Even if that gets nicely separated out, we end up in a scenario where build and bake still use the same utilities, which makes things tricky. This ended up being kind of a time sync, and IMO, we should probably first work out how we want this design to interact with bake before going too much further. I'm happy to take a look at this once we merge, I already spent some time getting familiar with what will make this tricky.
I wonder if we should prefer a config file in a known location like ~/.docker/buildx - instead of requiring it be passed each time :thinking:

Oct 25 '22 16:10 jedevc

@jedevc Thank you for your comments.

I'm having difficulty understanding the ioset package, especially the SingleForwarder. Some godoc and some inline comments explaining what's going on and what it's for would be really helpful (I've spent quite a while trying to work it out, and it's not clicking for me!) ioset's In/Out should probably be called PipeReader/PipeWriter, since they're created by a Pipe method (ideally they'd also have some way of making sure that the caller can't accidentally call Close on the Stdin/Stdout/Stderr directly). Again, some godoc would be nice heart

Added godoc to ioset package.

There's a few functions all with similar names and behaviors runBuild/runBuildOnBuilder/runBuildContext/buildTargets. This is kind of tricky to track through the code - some clearer names (and godoc) would be nice here.

Tried to make it clearer by renaming some of them as the following.

runBuildContext -> runBuildWithContext
runBuildOnBuilder -> launchControllerAndRunBuild
runBuild -> runBuild (unchanged)
buildTargets -> buildTargets (unchanged)

Please feel free to apply more changes (maybe in following-up PRs) if you come up with the better names.

We should avoid calling these "builders" - that name is already used for when we create drivers, and makes some of the code tricky to follow. We've already been using controllers (with BuildxController) which I quite like, but we should make sure that the whole PR follows the same standard.

Renamed builder to controller.

Some things for follow-ups:

Things mentioned here. Yup this PR is getting long, it'll be easier to work on these in-tree (and then we can multitask a bit easier) tada We should refactor out the different controllers (local and remote) into a separate package. I tried having a look at this myself, but as is, there's some strong coupling between the cli and the controllers. Even if that gets nicely separated out, we end up in a scenario where build and bake still use the same utilities, which makes things tricky. This ended up being kind of a time sync, and IMO, we should probably first work out how we want this design to interact with bake before going too much further. I'm happy to take a look at this once we merge, I already spent some time getting familiar with what will make this tricky.

Let's work on it in following up PRs.

Nov 07 '22 11:11 ktock

buildx buildx copied to clipboard

monitor: Enable to run build and invoke in background

Example:

buildx build

list

attach

kill

How it works

TODOs

buildx build

list

attach

disconnect

version mismatch

buildx
buildx copied to clipboard