for-linux icon indicating copy to clipboard operation
for-linux copied to clipboard

Container with a TCP connection is closed at the end of checkpoint. Also long checkpoint and restore times

Open YuichiroMaeyama opened this issue 5 years ago • 1 comments

  • [x] This is a bug report
  • [ ] This is a feature request
  • [x] I searched existing issues before opening this one

Expected behavior

  1. TCP sessions do not close
  2. Reduced checkpoint and restore processing time

Actual behavior

I start my own TCP server application(echoserver) as a Docker container, establish a TCP session, receive data at 100ms intervals, and sometimes checkpoint and restore.The TCP server application uses mallloc to secure a 1G area. There are two problems at this time.

  1. TCP session closes at checkpoint end

  2. Checkpoint and restore takes a long time

I have two questions

① Why is the TCP session closed? Is there a way to solve it? The additional option (tcp-established) is set with reference to the URL below. https://criu.org/Docker

② Checkpoint and restore time are too long. Is there a way to solve it?

Steps to reproduce the behavior

① Start TCP server application(echoserver) with Docker container

node@node:~$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
echoserver-image    latest              f8b954ec7462        About an hour ago   64.2MB
ubuntu              18.04               8e4ce0a6ce69        2 weeks ago         64.2MB
node@node:~$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                    NAMES
384c2bee3c68        echoserver-image    "echoserver 1234"   37 minutes ago      Up 14 minutes       0.0.0.0:1234->1234/tcp   echoserver

② Start TCP client of other node, establish session and send data at 100ms interval  Check the following logs on the TCP server

node@node:~$ docker logs -f echoserver
run telnet 384c2bee3c68(v6) 1234 
[1] connection (fd==4) from 192.168.1.50:37782
[1] received (fd==4) 2 bytes, 1
[1] received (fd==4) 2 bytes, 2
[1] received (fd==4) 2 bytes, 3
:
:

③ Perform a checkpoint Perform a checkpoint. It takes 42 seconds to finish at this time. Also, TCP reset packet is sent when checkpoint ends

node@node:~$ docker logs -f echoserver
:
:
[1] received (fd==4) 3 bytes, 29
[1] received (fd==4) 3 bytes, 30
[1] received (fd==4) 3 bytes, 31
[1] connection (fd==4) closed.
node@node:$ time docker start --checkpoint checkpoint1 echoserver

real	1m7.690s
user	0m0.020s
sys	0m0.020s

④ Perform a restore. It takes 1 minute to complete

node@node:$ time docker checkpoint create echoserver checkpoint1
checkpoint1

real	0m42.413s
user	0m0.010s
sys	0m0.019s

Output of docker version:

node@node:~$ docker --version
Docker version 19.03.12, build 48a66213fe

Output of docker info:

node@node:~/docker_test/echoserver$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 3
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.3.0-53-generic
 Operating System: Ubuntu 18.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 3.844GiB
 Name: node
 ID: XUWC:CGSD:ZIDK:OL6C:XHOO:MFRN:XLJE:57TX:ZSYA:KNSY:J6MF:V52X
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: XXXXXXXXXXXXXXXXXXXXXXXX
 HTTPS Proxy: YYYYYYYYYYYYYYYYYYYYYYYYY
 No Proxy: 127.0.0.1,localhost
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.)

node@node:~$ criu --version
Version: 3.14
node@node:~$ cat /etc/criu/runc.conf 
tcp-established
node@node:~$ uname -a
Linux node 5.3.0-53-generic #47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
node@node:~$ ps aux | grep docker
root      1304  1.2 19.5 1727076 786488 ?      Ssl  11:48   0:35 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --experimental
root      6457  0.0  0.0 479368  2392 ?        Sl   12:29   0:00 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 1234 -container-ip 172.17.0.2 -container-port 1234
root      6466  0.0  0.1 109104  4924 ?        Sl   12:30   0:00 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/384c2bee3c68462649efa7086d4db3469d5bed6dfb933df46710f547f99b926a -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
node@node:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:d9:7a:39 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute enp0s3
       valid_lft 82002sec preferred_lft 82002sec
    inet6 fe80::c55a:d51:8927:2d2/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:a9:d8:54 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.201/24 brd 192.168.1.255 scope global noprefixroute enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::2584:598:35f3:a3b4/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:a8:51:c9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.201/24 brd 192.168.2.255 scope global noprefixroute enp0s9
       valid_lft forever preferred_lft forever
    inet6 fe80::da08:220:2abd:1bb2/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:2b:b0:13:88 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:2bff:feb0:1388/64 scope link 
       valid_lft forever preferred_lft forever
19: vethf587834@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 42:77:9e:a6:c1:93 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::4077:9eff:fea6:c193/64 scope link 
       valid_lft forever preferred_lft forever
node@node:~$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
d4fa7d0a3c61        bridge              bridge              local
9473bedd0148        host                host                local
f6d26b2632da        none                null                local
node@node:~$ docker network inspect d4fa7d0a3c61
[
    {
        "Name": "bridge",
        "Id": "d4fa7d0a3c61361d03b2a014ca74702f4e01bc71e92686ba420471ba30d955a4",
        "Created": "2020-07-06T11:48:22.573255955+09:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.17.0.0/16",
                    "Gateway": "172.17.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "384c2bee3c68462649efa7086d4db3469d5bed6dfb933df46710f547f99b926a": {
                "Name": "echoserver",
                "EndpointID": "4f57d8384ce43ef22c054ccff8c6bd6090562185dc937f4c5402809fa59652a0",
                "MacAddress": "02:42:ac:11:00:02",
                "IPv4Address": "172.17.0.2/16",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.bridge.default_bridge": "true",
            "com.docker.network.bridge.enable_icc": "true",
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
            "com.docker.network.bridge.name": "docker0",
            "com.docker.network.driver.mtu": "1500"
        },
        "Labels": {}
    }
]

YuichiroMaeyama avatar Jul 06 '20 04:07 YuichiroMaeyama

@YuichiroMaeyama Hi. I realize you've had this problem for a long time, but have you been able to solve it? And if so, how?

ilorj avatar Oct 26 '23 20:10 ilorj