runtime
runtime copied to clipboard
How to make dockershim support clear containers?
Recently I was trying to make dockershim support cc, and found that dockershim supporting cc need to meet the following conditions at least:
- Dockershim need provide container type(app or sandbox container) info in
Annotation
that clear containers need. dockershim have provided this info in Label` field of docker container struct. - Docker need pass
Label
of docker container toAnnotations
of OCI runtime spec. I have sent a pr to moby project about this, see: moby/moby/pull/36181 - Annotations(Labels) of dockershim can be recognized by cc-runtime The annotations(labels) are defined as private fields in dockershim and can't be referenced by virtcontainer. So I only modify the codes locally for testing.
- Kubelet should be able to get the IP address after the sandbox started The IP address of the veth device that is create by cni plugin will be remove when cc-runtime create a tap device linked to the veth device. So we must create the network namespace and get the IP address of the veth before creating the sandbox container with the existing network namespace, just like this issue description in cri-containerd. However, docker does not provide an interface for using existing network namespace to create containers, we must use the namespace created by docker, so we can't get the IP via cni plugin after setting up the network since the IP of veth have been removed. This is the main obstruction point for me at present.
After modifying the code corresponding to the first three conditions, and compiling codes of each component and setting up the k8s cluster, I could create the pod successfully but the network of pod could not be set up correctly. I do not know whether it is related to conditions, expect to get help here. Thanks!
@sameo @plutoinmii
/cc @guangxuli
cc @egernst
@miaoyq I think we're looking at a couple of potential solutions here:
- Have virtcontainers somehow not cleaning the veth IP so that
PodSandboxStatus()
would just work. - Have docker-shim cache the pod IP and use that cache instead of running unnecessary
netns
calls into the pod networking namespace. A pod IP is immutable, so there's no reasons for always checking about it. - Use the Docker engine API (
GET /containers/(id or name)/json
) to ask for the pod container network settings from the Docker engine itself.
Solution #2 is obviously the most appealing to us, as #1 would be quite disruptive for CC in general. Solution #3 may work, but since I don't have a working docker-shim based CC deployment, I can't guarantee that. If it does work, it would be a quick and portable solution. #2 would still be more efficient...
Thanks for the heads up @sameo
Interesting, painful work @miaoyq! I don't have any other input aside from what sameo already pointed out. Are you looking to compare CC with dockershim v cri-containerd?
@sameo Thanks for providing the potential solutions
- Use the Docker engine API (GET /containers/(id or name)/json) to ask for the pod container network settings from the Docker engine itself.
The third solution doesn't work, since the network of sandbox is set up by cni
plugin, so docker can not get the IP of the veth. I have verified this locally. :-)
The second solution may be the best one. I will also try to seek better ways to get the sandbox IP on dockershim side.
Are you looking to compare CC with dockershim v cri-containerd?
@egernst Yeah. k8s + cri-containerd(containerd) + cc
can work normally. Currently, The main difference between dockershim
and cri-containerd
is the network creation of the sandbox.
@miaoyq There already is a per pod map in the dockerService
to track if a pod's network is ready. It would seem logical to add an IP string there in order to track both the network readiness and its IP.
It should not be a big effort neither, so probably worth it.
@sameo Yeah, it's a good idea.
Actually, dockershim have provided the way to store the pod data. When run a sandbox, the checkpointHandler
of dockerService
will create a checkpoint for this pod, the checkpoint can be used to track the pod IP. :-)
@sameo Today, I ran a container via docker + cc
, and get the container pid by docker inspect -f '{{.State.Pid}}' [container_id]
, I found the pid of container is not the pid of qemu
process (in container network namespace) but the pid of cc-shim
process (in host network namespace).
In dockershim, the pid of container is used to get the netns
path of sandbox container, so dockershim can't get the right netns
path currently.
Could we get the pid of qemu
process instead of cc-shim
process when cc-runtme create vm-based container?
@miaoyq we actually don't want to provide the pid of the qemu process since this is related to the pod and not to the container itself.
What we could do to solve your issue is to make sure that we run the shim in the right network namespace. Look into virtcontainers shim implementation cc_shim.go
pointing to some function in shim.go
. This is where we should spawn the shim process inside the right namespace.
Please open an issue on virtcontainers repo for that, so that we can discuss further on this.
Thanks @sboeuf, I also think it's reasonable to run shim in container network namespace. :) I have filed an issue on virtcontainers repo. See: https://github.com/containers/virtcontainers/issues/615
@sameo @sboeuf BTW, Could we reset the network for a exist cc container whose network have't been set when created?
@sameo @sboeuf BTW, Could we reset the network for a exist cc container whose network have't been set when created?
I think I have got the answer in https://github.com/containers/virtcontainers/issues/241. Looking forward to this feature. :-)