go-dockerclient icon indicating copy to clipboard operation
go-dockerclient copied to clipboard

AttachContainer sometimes hangs

Open gravis opened this issue 11 years ago • 25 comments

Hi,

I'm fighting with a nasty bug. Sometimes, but not always (a bit hard to reproduce), AttachContainer will hangs, eating 100% of CPU. I have profiled my program and ended up with this (profiled during the hang):

(pprof) top
Total: 3002 samples
    1906  63.5%  63.5%     3002 100.0% github.com/fsouza/go-dockerclient/utils.StdCopy
    1096  36.5% 100.0%     1096  36.5% bufio.(*Reader).Read
       0   0.0% 100.0%     3002 100.0% github.com/fsouza/go-dockerclient.func·001
       0   0.0% 100.0%     3002 100.0% runtime.gosched0

I'm using the latest go-dockerclient (master / e21b2a4562b3ec0731c54e59ab56c41c79067e12), and having the same problem with older versions (tried until 992e6abef8f797f41e1d7c90351056e3da5e141a)

My code is pretty standard:

  var buf bytes.Buffer
  opts := docker.AttachToContainerOptions{
      Container:    w.container.ID,
      OutputStream: &buf,
      Logs:         true,
      Stdout:       true,
      Stderr:       true,
  }
  err = w.client.AttachToContainer(opts)
  if err != nil {
      log.Printf("Error while fetching logs from container: %s\n", err)
  }

and works "most of the time" (99% of runs).

I'm not sure if it's related to docker, but here's my configuration:

Containers: 0
Images: 206
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 206
Execution Driver: native-0.2
Kernel Version: 3.2.0-4-amd64 

Any hint would be appreciated :)

gravis avatar Jul 08 '14 12:07 gravis

Which docker version are you using? I will try to reproduce. If you're using docker>=0.11, does it happen with the method Logs as well? It uses a different endpoint in the Docker API.

Is your container running anything special? Or are you able to reproduce it with any command?

fsouza avatar Jul 08 '14 13:07 fsouza

I didn't know the Logs commands, thanks for pointing it. Will try and report in a few minutes.

My container doesn't run anything special, just a shell script to get bower dependencies, and it doesn't always hangs if I restart the process with the same params, just sometimes.

I'm running docker 1.1.0. By the way, the container is stopped (Exited (0) 2 minutes ago) when I try to fetch to output (logs).

gravis avatar Jul 08 '14 13:07 gravis

Ok, so with Logs, it's a bit better. I'm killing our api server, because most of the results are invalid json. There's a weird char before the response from the container:

buf.String() ��{"name":"angular-resource","versions":["1.3.0-build.2893+sha.5963b5c"," [...]

(from code:

var buf bytes.Buffer
opts := docker.LogsOptions{
    Container:    w.container.ID,
    OutputStream: &buf,
    Stdout:       true,
    Stderr:       true,
}
err = w.client.Logs(opts)
if err != nil {
    log.Printf("Error while fetching logs from container: %s\n", err)
}

fmt.Printf("buf.String() %+v\n", buf.String())

Where docker run simply returns:

{"name":"angular-resource","versions":["1.3.0-build.2893+sha.5963b5c"," [...]

gravis avatar Jul 08 '14 13:07 gravis

@gravis that's bizarre.

I'm able to reproduce the bug with the strange caracter in the method Logs, I will have a look.

I'm trying to reproduce now the attach issue.

Thanks for the reports!

fsouza avatar Jul 08 '14 14:07 fsouza

That's "good" news, at least it's not related to my docker install nor my image/container :) Thanks again for your time and patience.

gravis avatar Jul 08 '14 14:07 gravis

@gravis the bug is with the docker multiplexing capability. We need to interpret that character and find out whether it's stdout or stderr. utils.StdCopy does that for us.

I'll push a fix soon. Still trying to figure out the AttachToContainer issue.

fsouza avatar Jul 08 '14 15:07 fsouza

Great!

Regarding AttachToContainer, were you able to reproduce it at least once?

gravis avatar Jul 08 '14 15:07 gravis

@gravis nope, not yet :(

fsouza avatar Jul 09 '14 13:07 fsouza

I fixed a bug in StdCopy implementation, I don't think it would hang, but it may be related.

I've also fixed the Logs issue. Can you give it another try (both for Logs and AttachToContainer)?

fsouza avatar Jul 09 '14 14:07 fsouza

Great!

gravis avatar Jul 09 '14 14:07 gravis

First test: Logs is hanging like AttachContainer now (+CPU @100%), so the issue is probably located around hijack or StdCopy

gravis avatar Jul 09 '14 14:07 gravis

Logs doesn't use hijack, so it's probably StdCopy. I'm still unable to reproduce, but I think we're closer to the solution now.

Please try setting RawTerminal: true on LogsOptions (or AttachToContainerOptions), you will get the bad characters again, but StdCopy won't get called.

fsouza avatar Jul 09 '14 14:07 fsouza

Ouch, AttachContainer is returning weird chars now from container logs :(

d┘▒┼±⎺↑ce┌e⎼≤==2↓2↓7
d┘▒┼±⎺↑┐⎺└b┤>=▮↓9↓4
└☃└e⎻▒⎼⎽e>=▮↓1↓3
⎻≤├▒⎺┼↑d▒├e┤├☃┌==1↓5
⎻d°└☃┼e⎼==2▮11▮515
stripe==1.12.0
⎻≤c⎼≤⎻├⎺==2↓6
b⎺├⎺==2↓13↓3
d┘▒┼±⎺↑├☃└e≥⎺┼e⎽==▮↓2
d┘▒┼±⎺↑⎽e⎽==▮↓4↓1
Ge⎼▒┌d⎺==▮↓4↓16
d┘▒┼±⎺↑├▒⎽├≤⎻☃e==▮↓9↓12
┘⎺▒┼┼≤↑c▒c▒e==1↓4
d⎺c┤├☃┌⎽==▮↓9↓1
chardet==2.1.1
dj-database-url==0.2.1
psycopg2==2.4.5
gunicorn==0.17.2
django-s3-folder-storage==0.1
django-storages==1.1.5
python-memcached==1.48
W▒⎺⎺⎽▒==2↓4↓1
⎽☃└⎻┌e↑⎽▒┌e⎽°⎺⎼ce==▮↓51

gravis avatar Jul 09 '14 15:07 gravis

(we should probably create a branch to test, master is getting messy :()

gravis avatar Jul 09 '14 15:07 gravis

@gravis with or without RawTerminal? Seems like it's working fine here. I will run some tsuru integration tests against go-dockerclient's master.

Were you able to test it with RawTerminal: true? Does it hang?

fsouza avatar Jul 09 '14 16:07 fsouza

Definitely voting for an issue with StdCopy. It works fine (so far) with RawTerminal: true! Let's wait another hour to be sure.

gravis avatar Jul 09 '14 16:07 gravis

It would have hanged already. Using RawTerminal fixed it. Thanks There's something to fix in StdCopy :(

gravis avatar Jul 09 '14 16:07 gravis

Good news, I've just managed to reproduce the hang in an automated test. It fails whenever StdCopy needs to grow the buffer. I will push a fix soon :)

fsouza avatar Jul 09 '14 20:07 fsouza

You're my hero :)

gravis avatar Jul 09 '14 21:07 gravis

I think this may still be happening.

grafana-kminehart-linux ~ » docker version
Client:
 Version:           20.10.14
 API version:       1.41
 Go version:        go1.18
 Git commit:        a224086349
 Built:             Thu Mar 24 08:56:17 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.14
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18
  Git commit:       87a90dc786
  Built:            Thu Mar 24 08:56:03 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.6.2
  GitCommit:        de8046a5501db9e0e478e1c10cbcfb21af4c6b2d.m
 runc:
  Version:          1.1.1
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

In github.com/grafana/scribe, I have this function:

https://github.com/grafana/scribe/blob/fix-docker-mode/plumbing/pipeline/clients/docker/container.go#L102

and I'm expecting the RunContainer function to return when the container exits.

It seems to work pretty consistently for the container created here but not the one created here.

This is what docker ps -a usually reports about the container while AttachToContainer is still waiting to return something:

e9052d92db24   ubuntu:latest   "/opt/scribe/pipelin…"   About a minute ago   Exited (0) About a minute ago             scribe--esDVKqiq

I have a feeling this may be related to how quickly the container exits before "AttachToContainer" is called. I'll try adding a sleep before echoing and seeing what happens.


Edit: confirmed, it's definitely "AttachToContainer" being called right before or right after the container has exited, causing it to hang. Do you think there might be a solution to this? I'd be happy to submit a PR.

kminehart avatar Jun 20 '22 23:06 kminehart

@kminehart hey, sorry for the delayed response. Do you think we'd be able to get a minimal reproducer? If not, I can dig into the code you shared and see if we can figure out exactly what's going on.

fsouza avatar Jun 26 '22 23:06 fsouza

I can try to get one today!

kminehart avatar Jun 28 '22 19:06 kminehart

@fsouza Much appreciated, here's a link to an example I able to reproduce it with.

https://gist.github.com/kminehart/e50ae98557adc139ba9648077271367b

kminehart avatar Jun 28 '22 21:06 kminehart

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Jul 31 '22 08:07 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Nov 02 '22 07:11 stale[bot]