runtime How to make dockershim support clear containers?

Recently I was trying to make dockershim support cc, and found that dockershim supporting cc need to meet the following conditions at least:

Dockershim need provide container type(app or sandbox container) info in Annotation that clear containers need. dockershim have provided this info in Label` field of docker container struct.
Docker need pass Label of docker container to Annotations of OCI runtime spec. I have sent a pr to moby project about this, see: moby/moby/pull/36181
Annotations(Labels) of dockershim can be recognized by cc-runtime The annotations(labels) are defined as private fields in dockershim and can't be referenced by virtcontainer. So I only modify the codes locally for testing.
Kubelet should be able to get the IP address after the sandbox started The IP address of the veth device that is create by cni plugin will be remove when cc-runtime create a tap device linked to the veth device. So we must create the network namespace and get the IP address of the veth before creating the sandbox container with the existing network namespace, just like this issue description in cri-containerd. However, docker does not provide an interface for using existing network namespace to create containers, we must use the namespace created by docker, so we can't get the IP via cni plugin after setting up the network since the IP of veth have been removed. This is the main obstruction point for me at present.

After modifying the code corresponding to the first three conditions, and compiling codes of each component and setting up the k8s cluster, I could create the pod successfully but the network of pod could not be set up correctly. I do not know whether it is related to conditions, expect to get help here. Thanks!

@sameo @plutoinmii

/cc @guangxuli

Feb 06 '18 02:02 miaoyq

cc @egernst

@miaoyq I think we're looking at a couple of potential solutions here:

Have virtcontainers somehow not cleaning the veth IP so that PodSandboxStatus() would just work.
Have docker-shim cache the pod IP and use that cache instead of running unnecessary netns calls into the pod networking namespace. A pod IP is immutable, so there's no reasons for always checking about it.
Use the Docker engine API (GET /containers/(id or name)/json) to ask for the pod container network settings from the Docker engine itself.

Solution #2 is obviously the most appealing to us, as #1 would be quite disruptive for CC in general. Solution #3 may work, but since I don't have a working docker-shim based CC deployment, I can't guarantee that. If it does work, it would be a quick and portable solution. #2 would still be more efficient...

Feb 06 '18 18:02 sameo

Thanks for the heads up @sameo

Interesting, painful work @miaoyq! I don't have any other input aside from what sameo already pointed out. Are you looking to compare CC with dockershim v cri-containerd?

Feb 06 '18 22:02 egernst

@sameo Thanks for providing the potential solutions

Use the Docker engine API (GET /containers/(id or name)/json) to ask for the pod container network settings from the Docker engine itself.

The third solution doesn't work, since the network of sandbox is set up by cni plugin, so docker can not get the IP of the veth. I have verified this locally. :-)

The second solution may be the best one. I will also try to seek better ways to get the sandbox IP on dockershim side.

Feb 07 '18 06:02 miaoyq

Are you looking to compare CC with dockershim v cri-containerd?

@egernst Yeah. k8s + cri-containerd(containerd) + cc can work normally. Currently, The main difference between dockershim and cri-containerd is the network creation of the sandbox.

Feb 07 '18 06:02 miaoyq

@miaoyq There already is a per pod map in the dockerService to track if a pod's network is ready. It would seem logical to add an IP string there in order to track both the network readiness and its IP. It should not be a big effort neither, so probably worth it.

Feb 07 '18 13:02 sameo

@sameo Yeah, it's a good idea. Actually, dockershim have provided the way to store the pod data. When run a sandbox, the checkpointHandler of dockerService will create a checkpoint for this pod, the checkpoint can be used to track the pod IP. :-)

Feb 09 '18 08:02 miaoyq

@sameo Today, I ran a container via docker + cc, and get the container pid by docker inspect -f '{{.State.Pid}}' [container_id], I found the pid of container is not the pid of qemu process (in container network namespace) but the pid of cc-shim process (in host network namespace).

In dockershim, the pid of container is used to get the netns path of sandbox container, so dockershim can't get the right netns path currently.

Could we get the pid of qemu process instead of cc-shim process when cc-runtme create vm-based container?

Feb 09 '18 09:02 miaoyq

@miaoyq we actually don't want to provide the pid of the qemu process since this is related to the pod and not to the container itself. What we could do to solve your issue is to make sure that we run the shim in the right network namespace. Look into virtcontainers shim implementation cc_shim.go pointing to some function in shim.go. This is where we should spawn the shim process inside the right namespace.

Please open an issue on virtcontainers repo for that, so that we can discuss further on this.

Feb 09 '18 14:02 sboeuf

Thanks @sboeuf, I also think it's reasonable to run shim in container network namespace. :) I have filed an issue on virtcontainers repo. See: https://github.com/containers/virtcontainers/issues/615

Feb 11 '18 08:02 miaoyq

@sameo @sboeuf BTW, Could we reset the network for a exist cc container whose network have't been set when created?

Feb 11 '18 08:02 miaoyq

@sameo @sboeuf BTW, Could we reset the network for a exist cc container whose network have't been set when created?

I think I have got the answer in https://github.com/containers/virtcontainers/issues/241. Looking forward to this feature. :-)

Feb 11 '18 09:02 miaoyq

runtime runtime copied to clipboard

How to make dockershim support clear containers?

runtime
runtime copied to clipboard