go-dockerclient
go-dockerclient copied to clipboard
AttachContainer sometimes hangs
Hi,
I'm fighting with a nasty bug. Sometimes, but not always (a bit hard to reproduce), AttachContainer will hangs, eating 100% of CPU. I have profiled my program and ended up with this (profiled during the hang):
(pprof) top
Total: 3002 samples
1906 63.5% 63.5% 3002 100.0% github.com/fsouza/go-dockerclient/utils.StdCopy
1096 36.5% 100.0% 1096 36.5% bufio.(*Reader).Read
0 0.0% 100.0% 3002 100.0% github.com/fsouza/go-dockerclient.func·001
0 0.0% 100.0% 3002 100.0% runtime.gosched0
I'm using the latest go-dockerclient (master / e21b2a4562b3ec0731c54e59ab56c41c79067e12), and having the same problem with older versions (tried until 992e6abef8f797f41e1d7c90351056e3da5e141a)
My code is pretty standard:
var buf bytes.Buffer
opts := docker.AttachToContainerOptions{
Container: w.container.ID,
OutputStream: &buf,
Logs: true,
Stdout: true,
Stderr: true,
}
err = w.client.AttachToContainer(opts)
if err != nil {
log.Printf("Error while fetching logs from container: %s\n", err)
}
and works "most of the time" (99% of runs).
I'm not sure if it's related to docker, but here's my configuration:
Containers: 0
Images: 206
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Dirs: 206
Execution Driver: native-0.2
Kernel Version: 3.2.0-4-amd64
Any hint would be appreciated :)
Which docker version are you using? I will try to reproduce. If you're using docker>=0.11, does it happen with the method Logs as well? It uses a different endpoint in the Docker API.
Is your container running anything special? Or are you able to reproduce it with any command?
I didn't know the Logs commands, thanks for pointing it. Will try and report in a few minutes.
My container doesn't run anything special, just a shell script to get bower dependencies, and it doesn't always hangs if I restart the process with the same params, just sometimes.
I'm running docker 1.1.0. By the way, the container is stopped (Exited (0) 2 minutes ago) when I try to fetch to output (logs).
Ok, so with Logs, it's a bit better. I'm killing our api server, because most of the results are invalid json. There's a weird char before the response from the container:
buf.String() ��{"name":"angular-resource","versions":["1.3.0-build.2893+sha.5963b5c"," [...]
(from code:
var buf bytes.Buffer
opts := docker.LogsOptions{
Container: w.container.ID,
OutputStream: &buf,
Stdout: true,
Stderr: true,
}
err = w.client.Logs(opts)
if err != nil {
log.Printf("Error while fetching logs from container: %s\n", err)
}
fmt.Printf("buf.String() %+v\n", buf.String())
Where docker run simply returns:
{"name":"angular-resource","versions":["1.3.0-build.2893+sha.5963b5c"," [...]
@gravis that's bizarre.
I'm able to reproduce the bug with the strange caracter in the method Logs, I will have a look.
I'm trying to reproduce now the attach issue.
Thanks for the reports!
That's "good" news, at least it's not related to my docker install nor my image/container :) Thanks again for your time and patience.
@gravis the bug is with the docker multiplexing capability. We need to interpret that character and find out whether it's stdout or stderr. utils.StdCopy does that for us.
I'll push a fix soon. Still trying to figure out the AttachToContainer issue.
Great!
Regarding AttachToContainer, were you able to reproduce it at least once?
@gravis nope, not yet :(
I fixed a bug in StdCopy implementation, I don't think it would hang, but it may be related.
I've also fixed the Logs issue. Can you give it another try (both for Logs and AttachToContainer)?
Great!
First test: Logs is hanging like AttachContainer now (+CPU @100%), so the issue is probably located around hijack or StdCopy
Logs doesn't use hijack, so it's probably StdCopy. I'm still unable to reproduce, but I think we're closer to the solution now.
Please try setting RawTerminal: true on LogsOptions (or AttachToContainerOptions), you will get the bad characters again, but StdCopy won't get called.
Ouch, AttachContainer is returning weird chars now from container logs :(
d┘▒┼±⎺↑ce┌e⎼≤==2↓2↓7
d┘▒┼±⎺↑┐⎺└b┤>=▮↓9↓4
└☃└e⎻▒⎼⎽e>=▮↓1↓3
⎻≤├▒⎺┼↑d▒├e┤├☃┌==1↓5
⎻d°└☃┼e⎼==2▮11▮515
stripe==1.12.0
⎻≤c⎼≤⎻├⎺==2↓6
b⎺├⎺==2↓13↓3
d┘▒┼±⎺↑├☃└e≥⎺┼e⎽==▮↓2
d┘▒┼±⎺↑⎽e⎽==▮↓4↓1
Ge⎼▒┌d⎺==▮↓4↓16
d┘▒┼±⎺↑├▒⎽├≤⎻☃e==▮↓9↓12
┘⎺▒┼┼≤↑c▒c▒e==1↓4
d⎺c┤├☃┌⎽==▮↓9↓1
chardet==2.1.1
dj-database-url==0.2.1
psycopg2==2.4.5
gunicorn==0.17.2
django-s3-folder-storage==0.1
django-storages==1.1.5
python-memcached==1.48
W▒⎺⎺⎽▒==2↓4↓1
⎽☃└⎻┌e↑⎽▒┌e⎽°⎺⎼ce==▮↓51
(we should probably create a branch to test, master is getting messy :()
@gravis with or without RawTerminal? Seems like it's working fine here. I will run some tsuru integration tests against go-dockerclient's master.
Were you able to test it with RawTerminal: true? Does it hang?
Definitely voting for an issue with StdCopy. It works fine (so far) with RawTerminal: true!
Let's wait another hour to be sure.
It would have hanged already. Using RawTerminal fixed it. Thanks
There's something to fix in StdCopy :(
Good news, I've just managed to reproduce the hang in an automated test. It fails whenever StdCopy needs to grow the buffer. I will push a fix soon :)
You're my hero :)
I think this may still be happening.
grafana-kminehart-linux ~ » docker version
Client:
Version: 20.10.14
API version: 1.41
Go version: go1.18
Git commit: a224086349
Built: Thu Mar 24 08:56:17 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.14
API version: 1.41 (minimum version 1.12)
Go version: go1.18
Git commit: 87a90dc786
Built: Thu Mar 24 08:56:03 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.6.2
GitCommit: de8046a5501db9e0e478e1c10cbcfb21af4c6b2d.m
runc:
Version: 1.1.1
GitCommit:
docker-init:
Version: 0.19.0
GitCommit: de40ad0
In github.com/grafana/scribe, I have this function:
https://github.com/grafana/scribe/blob/fix-docker-mode/plumbing/pipeline/clients/docker/container.go#L102
and I'm expecting the RunContainer function to return when the container exits.
It seems to work pretty consistently for the container created here but not the one created here.
This is what docker ps -a usually reports about the container while AttachToContainer is still waiting to return something:
e9052d92db24 ubuntu:latest "/opt/scribe/pipelin…" About a minute ago Exited (0) About a minute ago scribe--esDVKqiq
I have a feeling this may be related to how quickly the container exits before "AttachToContainer" is called. I'll try adding a sleep before echoing and seeing what happens.
Edit: confirmed, it's definitely "AttachToContainer" being called right before or right after the container has exited, causing it to hang. Do you think there might be a solution to this? I'd be happy to submit a PR.
@kminehart hey, sorry for the delayed response. Do you think we'd be able to get a minimal reproducer? If not, I can dig into the code you shared and see if we can figure out exactly what's going on.
I can try to get one today!
@fsouza Much appreciated, here's a link to an example I able to reproduce it with.
https://gist.github.com/kminehart/e50ae98557adc139ba9648077271367b
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.