bollard icon indicating copy to clipboard operation
bollard copied to clipboard

container exits but output stream remains open

Open tdyas opened this issue 2 years ago • 10 comments

I am trying to capture output from a running container using Docker::attach_container but it appears that the output stream does not close even though the container exited (and which was reported via the wait_container stream). The code at issue is in this PR https://github.com/pantsbuild/pants/pull/16766.

That code monitors the output stream returned by attach_container and the stream from wait_container for the exit code with a tokio::select!. When the container exits, the code breaks out of the monitoring loop and then tries to see if any output remains on the output stream, but that stream never closes so the code hangs there.

Am I somehow not using the APIs correctly? (The machine is macOS 12.5.1 with Docker Desktop 4.11.1.)

tdyas avatar Sep 03 '22 23:09 tdyas

So, with the 0.13 version of Bollard, wait_container will only emit an Err if the Docker server responds with a non-200 status code - for example, if the container doesn't exist or some other network related error. In that version of Bollard, you could capture the status code emitted by the wait_container operation and handle it how you need to.

However, I've just merged a PR that should change that behaviour https://github.com/fussybeaver/bollard/pull/250 - in the latest master, streams that return an error in the JSON will also return an error in Bollard.

Let me know in your testing if that works for you.

fussybeaver avatar Sep 04 '22 07:09 fussybeaver

So, with the 0.13 version of Bollard, wait_container will only emit an Err if the Docker server responds with a non-200 status code - for example, if the container doesn't exist or some other network related error. In that version of Bollard, you could capture the status code emitted by the wait_container operation and handle it how you need to.

To clarify, the issue that I am seeing is not with the wait_container stream. That stream actually does report the container exit correctly. The issue is that the attach_container output stream does not close once the container has gone away. This means that, after the container exit is reported on the wait_container stream, the code that checks the attach_container output stream hangs because the attach_container output stream never closes.

The context is that the project in question is a build tool that needs to capture all output from running a build action in a Docker container. I observed a race condition in testing where my code would fail to capture some output. My thought was to check the attach_container output stream to make sure there was no "left-over" output remaining on that stream after the container exit was reported.

Regarding https://github.com/fussybeaver/bollard/pull/250, I will try it out with the hope that maybe that it will change the behavior of the attach_container output stream.

tdyas avatar Sep 04 '22 17:09 tdyas

I tried the latest master branch (at cfebd306a4). No change in behavior; the attach_container still remains open even after the container has exited (and as reported on the wait_container stream).

test.log

tdyas avatar Sep 04 '22 21:09 tdyas

I'm having similar issues - or so I think, at least. I may have a repro case too, one that isn't particularly big, either. I have to clean it up a little first, it's in a private repo at the moment, have to lift it out of there.

algernon avatar Sep 04 '22 22:09 algernon

Ok, thanks for explaining a bit. That does sound quite different. I would appreciate a minimal test case -- @antoinert do you have any idea why this might be happening?

fussybeaver avatar Sep 05 '22 08:09 fussybeaver

Ok, thanks for explaining a bit. That does sound quite different. I would appreciate a minimal test case -- @antoinert do you have any idea why this might be happening?

https://github.com/fussybeaver/bollard/pull/254 has a minimal reproduction.

tdyas avatar Sep 06 '22 18:09 tdyas

Thanks for the reduced example.

I took a quick look, and can confirm that the streams are hanging. Some playing around though confirmed there is a timing issue with the running container and the client that attaches the output. If you change the cmd to vec["sleep", "3"] or execute attach_container prior to start_container, the streams are closed properly. So, there's a workaround there.

I'll be taking a closer look at the hyper logs to see if there's something else I can see.

fussybeaver avatar Sep 07 '22 18:09 fussybeaver

For my own use case, I ended up using executions (which return the output stream), and so this issue is not an issue for me any more. I can close unless someone else has a reason to keep it open.

In any event, it seems like the proper way to attach to a container would be to call attach_container first before calling start_container?

tdyas avatar Sep 13 '22 21:09 tdyas

Let's keep the ticket open.

I tried comparing the hyper / tokio logs between the two, but didn't see anything of notice. The server should have the responsibility of closing the stream, so it should be reproducible on the moby codebase - that's my next step regarding this ticket.

fussybeaver avatar Sep 14 '22 08:09 fussybeaver

I have done some more due diligence to this issue to try and understand what is going on, and replicated the example PR using the golang moby client SDK: https://github.com/moby/moby/compare/master...fussybeaver:moby:ND-attach-ordering-issue?expand=1

One thing struck me initially - removing the stream=true parameter means nothing is emitted. The documentation does suggest this though https://docs.docker.com/engine/api/v1.41/#tag/Container/operation/ContainerAttach

As demonstrated in this ticket (and in this closed moby ticket), the intended useage of the API is containerCreate, containerAttach, containerStart, containerWait. In moby, when containerAttach is triggered for a stopped container, the behaviour of the client SDK is to hang on the attach call. According to a similar bug ticket on the moby project, this is by design, in case you call attach before a container has started. I verified the hung client would continue by restarting a previously stopped container.

Similarly in Bollard, I assume that a stream will only close when a container is stopped. We should probably change the documentation to make this point clear when using this API. I'm open to any suggestions though on making this more ergonomic.

fussybeaver avatar Oct 09 '22 10:10 fussybeaver