runv icon indicating copy to clipboard operation
runv copied to clipboard

Bug report: leaking container while doing pressure test

Open WeiZhang555 opened this issue 9 years ago • 3 comments

  • Test case:
  1. Test runv-containerd with docker daemon
  2. start 1000 container, then "docker rm" all
  • Expected result: All containers are removed and no container is left
  • Actual result: Some containers can't be removed and left on system forever, unless you restart docker daemon.
$ docker rm -f ea3bcc75e63a
Error response from daemon: Could not kill running container ea3bcc75e63adb0ad33fe59fc733af6cac9f6f484dabd33b3e798af3f458250e, cannot remove - Cannot kill container ea3bcc75e63adb0ad33fe59fc733af6cac9f6f484dabd33b3e798af3f458250e: rpc error: code = 2 desc = "The container ea3bcc75e63adb0ad33fe59fc733af6cac9f6f484dabd33b3e798af3f458250e or the process init is not found"
  • Frequency: sometimes, not always. If you can't reproduce, then test again.
  • Root cause: https://github.com/hyperhq/runv/blob/master/supervisor/hyperpod.go#L415

c.run(p) start the VM and container in a goroutine, but never return the error, so if the VM start failed, runv-containerd still send success response to docker, docker will regard this as a running container but it's not. That's why docker can't kill it and can't remove it any more, because it's trying to kill nothing!

When I try to dig deeper, I found that sometimes https://github.com/hyperhq/runv/blob/master/supervisor/container.go#L35 will hang, I can't find the real cause of the hanging, PLEASE find it, it's really vital!

one extra thing:

https://github.com/hyperhq/runv/blob/master/supervisor/container.go#L71 should return an error but not nil, I'll send a patch for this.

Note: I was testing based on our internal version which diverge a little bit with the latest upstream version, but I believe the problem is still there

WeiZhang555 avatar Oct 21 '16 03:10 WeiZhang555

When I try to dig deeper, I found that sometimes https://github.com/hyperhq/runv/blob/master/supervisor/container.go#L35 will hang

I found this too when integrate cri-o, it seems runv will hang when starting failed before container could start.

Crazykev avatar Oct 21 '16 07:10 Crazykev

the related code was changed a little, could you check it again please?

laijs avatar May 17 '17 10:05 laijs

I'll try tomorrow :-)

WeiZhang555 avatar May 17 '17 14:05 WeiZhang555