docker icon indicating copy to clipboard operation
docker copied to clipboard

Update restore network fix for checkpointed container to latest bouch…

Open huikang opened this issue 10 years ago • 8 comments

…er/docker/cr-combined.

Reuse the endpoint of the checkpointed container when restore. Pass veth pair name to ciur when restore a checkpointed container.

TODO: Add libnetwork API to retrieve ethXXX in the container

Signed-off-by: Hui Kang [email protected]

huikang avatar Oct 18 '15 03:10 huikang

I'll try this out as soon as I get a chance.

boucher avatar Oct 18 '15 13:10 boucher

@boucher Updated. Looks like we need to work with libnetwork regarding the interface.

huikang avatar Oct 18 '15 21:10 huikang

Sorry for the delay. This seems to be working for me locally. Unfortunately, I can't really merge it with the libnetwork change. We need to figure out a way to do this that they'll accept.

boucher avatar Oct 22 '15 15:10 boucher

Unfortunately, I've rebased and this no longer applies again. container.NetworkSettings.EndpointID no longer appears to exist, and the releaseNetwork logic has been moved around quite a bit.

boucher avatar Oct 29 '15 18:10 boucher

I've pushed an attempted update here, but it has some flaws: https://github.com/boucher/docker/tree/huikang-network-fix-rebased

boucher avatar Oct 29 '15 18:10 boucher

I will look at it soon. Thanks.

huikang avatar Oct 29 '15 19:10 huikang

I am trying to checkpoint and restore a container with active TCP connection. For this i took the latest code from boucher's cr-combined branch and compiled it with Experimental flag enabled.

I have compiled and installed CRIU version 1.8

I have a docker image (TCP server ) which contains the code to listen on a TCP socket. And i execute a client code which sends messages to the server and waits for the response from server. The client is executed from the same host in which the containers are running

When i issue a checkpoint the client sends the message to the server and keeps the waiting for the response.

Once the server container is restored( same container and not new one) the client is unable to send the message and the client exits. Also found that the interface eth0 of the restored container is not in running state( From the container the docker bridge is not pingable).

The above issue is not seen if i run docker with --net=host option and checkpoint and restore of tcp connection works seamlessly.

Is this an know issue and is there any workaround for it ?

amakumar avatar Dec 12 '15 03:12 amakumar

@amakumar , Great thanks for your post.

I faced the same issue. --net=host can bypass this issue. I had checked this for several days. I found that the fd file are restored properly.

Process number matters

Here is the clue I had found:

if you donot use --net=host,

you will get:(ps auxf) root 14333 0.0 3.0 1162612 30932 ? Sl Jun14 1:04 docker daemon root 23262 0.0 1.8 121968 18712 ? Sl 22:23 0:00 _ docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 9095 -container-ip 1 root 23285 2.8 4.2 412488 42964 ? Ssl 22:23 0:01 _ ./moped_server -j 2

Here we get TWO process! but if you run with '--net-host'.

we will get only ONE process.

Here is the difference.

https://criu.org/Inheriting_FDs_on_restore with two processes, container relies on the FIFO to connect each.

lsof -p [container process id]

myapp 23285 root 1w FIFO 0,9 0t0 262144 pipe myapp 23285 root 2w FIFO 0,9 0t0 262145 pipe myapp 23285 root 3u sock 0,8 0t0 260837 can't identify protocol

maybe docker native checkpoint /restore do not support or handle inherite_FD operation very well.

hixichen avatar Jun 15 '16 22:06 hixichen