nerdctl icon indicating copy to clipboard operation
nerdctl copied to clipboard

nerdctl ps slows down and errors with 350+ containers

Open mac-chaffee opened this issue 4 years ago • 4 comments

I'm running a load test on my kubernetes cluster using ClusterLoader2, which just runs a bunch of pause containers on each node. When I have around 350 containers, the performance of nerdctl ps is affected compared to ctr:

$ time sudo /usr/local/bin/nerdctl --debug-full -n k8s.io ps
FATA[0020] container "4690dc5561d03a5c89453546a3c5d7c0a7ce7c3938cac1560cf358a2c6c040e9" in namespace "k8s.io": not found

real	0m20.374s
user	0m0.284s
sys	0m0.117s
$ time sudo /usr/local/bin/nerdctl --debug-full -n k8s.io container ls
FATA[0034] container "4a08c9bdba3dc064ad82fbd583f992d409dd8ee3346bfd413ea010cae1f43030" in namespace "k8s.io": not found

real	0m34.376s
user	0m0.284s
sys	0m0.132s
$ time sudo ctr -n k8s.io c ls | wc -l
354

real	0m0.126s
user	0m0.081s
sys	0m0.091s

Also notice that the command fails due to a container being removed while the command was running (which becomes more likely the longer the command takes).

I think the race condition is caused by ps.go calling c.Spec on each container after fetching the list of containers, meaning that if a container is removed before we can inspect it, the command will error. Could be fixed by skipping the removed container rather than erroring if the error is "not found": https://github.com/containerd/nerdctl/blob/cee3b6a4840db6b5dd4019ef343af7bf4ba5c940/cmd/nerdctl/ps.go#L118-L122

Not sure what to do about the performance issue though if we have to make O(n) requests to Spec each container. Maybe we could do some of those requests in parallel?

mac-chaffee avatar Aug 26 '21 21:08 mac-chaffee

Maybe we could do some of those requests in parallel?

SGTM. We should also skip inspecting c.Spec when --quiet is set

AkihiroSuda avatar Aug 27 '21 07:08 AkihiroSuda

I looked a bit into this. In the Docker implementation, only one call to the daemon happens. Here: https://github.com/docker/cli/blob/3dad26ca2d418092b8c4e01b03d0455d583bec86/cli/command/container/list.go#L122

In the nerdctl implementation, we make O(n) calls to c.Spec to achieve the same. My question: is c.Spec a call to containerd? If it's not, it shouldn't slow this operation down. The only place I'm sure makes O(n) calls to containerd is https://github.com/containerd/nerdctl/blob/e83e18b98e89c7f5948c5777ab3ca0068299e703/cmd/nerdctl/ps.go#L234-L235

But that only happens with --size or --format=wide so this can't be it.

I tried running ~200 nginx containers on my machine and nerdctl ps returns quickly (<2 seconds). I can't reproduce.

yardenshoham avatar Sep 23 '22 15:09 yardenshoham