runtime icon indicating copy to clipboard operation
runtime copied to clipboard

net: netns mount/namespaces 'hang around' after containers stopped/deleted

Open grahamwhaley opened this issue 6 years ago • 5 comments

Description of problem

After running some test cases it can be seen with mount that a number of nsfs mounts are still present on `/run/docker/netns/XXXXX' where XXX are podIDs

This can be reproduced with the test case here: https://github.com/clearcontainers/tests/pull/491 or with the simpler case provided by @sboeuf:

$ for i in `seq 1 50`; do docker run -tid nginx; done; for cid in $(docker ps -a -q); do time docker rm -f cid; done

Expected result

I'd not expect any namespace or net mounts to be left. Note, additionally, I think you may also be able to see the matching docker elements with a docker network inspect bridge ?

Actual result

As above, netfs on /run/docker/netns/XXXXX type nsfs (rw)' still shown with mount`


(difficult for me to get cc info in here as it is on a different machine - will attach after) Produced with Release 3.0.2 on Ubuntu 16.04 using overlay2 as the graph backend.

grahamwhaley avatar Oct 12 '17 13:10 grahamwhaley

cc-collect-data.sh output attached collect.txt

/cc @mcastelino

grahamwhaley avatar Oct 12 '17 14:10 grahamwhaley

@grahamwhaley I see this happen even without any soak tests. I am running tip of the tree code for all elements. Let me see what is different on my system that causes this.

mcastelino avatar Oct 12 '17 17:10 mcastelino

This only seems to happen to me when I delete many containers. For instance, using the 'simple' script above with 5, 10 or 20 containers, doing docker rm either linearly or all at once (single command to delete all containers), I don't see the problem. If I up the count to 70 and use a single docker rm, then I get 46 namespaces left. Also, in the soak test it always seems to fail on 'container-61', which I think is the 62nd container launched. odd! (but in my head 62 is pretty near 64, which is one a nice binary-ish number).

I'll try to do some more 'binary chop' analysis to see if I identify a sweet spot tomorrow if we have no more evidence.

grahamwhaley avatar Oct 12 '17 18:10 grahamwhaley

@grahamwhaley I am on FC24 running rawhide.

mcastelino avatar Oct 12 '17 18:10 mcastelino

I'm slightly confused by your info @mcastelino - I too am running at the HEAD of the code base, but I am on Ubuntu 16.04. I suspect this problem is not related to a set distro or code point - I thing we've had the issue around for some time (just never logged it as repeatable). I binary chopped with the simple test above - if I use 43 containers then it works and no netns are left around. If I use 44 then 18 netns mounts left over. My gut feeling is that this may be more likely a timeout issue (and more containers make our runtime take longer as it iterates the list doing some searches, and hence increases the time it takes to do things like shutdown and remove).

I won't get back to this until early next week now - but, if there are any clues what to look for (for example, who tears down the net namespace and where - is that the runtime or docker or CNI post hooks etc.), then put them here and I can dig/gather more info.

grahamwhaley avatar Oct 13 '17 10:10 grahamwhaley