Container with a TCP connection is closed at the end of checkpoint. Also long checkpoint and restore times
- [x] This is a bug report
- [ ] This is a feature request
- [x] I searched existing issues before opening this one
Expected behavior
- TCP sessions do not close
- Reduced checkpoint and restore processing time
Actual behavior
I start my own TCP server application(echoserver) as a Docker container, establish a TCP session, receive data at 100ms intervals, and sometimes checkpoint and restore.The TCP server application uses mallloc to secure a 1G area. There are two problems at this time.
-
TCP session closes at checkpoint end
-
Checkpoint and restore takes a long time
I have two questions
① Why is the TCP session closed? Is there a way to solve it? The additional option (tcp-established) is set with reference to the URL below. https://criu.org/Docker
② Checkpoint and restore time are too long. Is there a way to solve it?
Steps to reproduce the behavior
① Start TCP server application(echoserver) with Docker container
node@node:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
echoserver-image latest f8b954ec7462 About an hour ago 64.2MB
ubuntu 18.04 8e4ce0a6ce69 2 weeks ago 64.2MB
node@node:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
384c2bee3c68 echoserver-image "echoserver 1234" 37 minutes ago Up 14 minutes 0.0.0.0:1234->1234/tcp echoserver
② Start TCP client of other node, establish session and send data at 100ms interval Check the following logs on the TCP server
node@node:~$ docker logs -f echoserver
run telnet 384c2bee3c68(v6) 1234
[1] connection (fd==4) from 192.168.1.50:37782
[1] received (fd==4) 2 bytes, 1
[1] received (fd==4) 2 bytes, 2
[1] received (fd==4) 2 bytes, 3
:
:
③ Perform a checkpoint Perform a checkpoint. It takes 42 seconds to finish at this time. Also, TCP reset packet is sent when checkpoint ends
node@node:~$ docker logs -f echoserver
:
:
[1] received (fd==4) 3 bytes, 29
[1] received (fd==4) 3 bytes, 30
[1] received (fd==4) 3 bytes, 31
[1] connection (fd==4) closed.
node@node:$ time docker start --checkpoint checkpoint1 echoserver
real 1m7.690s
user 0m0.020s
sys 0m0.020s
④ Perform a restore. It takes 1 minute to complete
node@node:$ time docker checkpoint create echoserver checkpoint1
checkpoint1
real 0m42.413s
user 0m0.010s
sys 0m0.019s
Output of docker version:
node@node:~$ docker --version
Docker version 19.03.12, build 48a66213fe
Output of docker info:
node@node:~/docker_test/echoserver$ docker info
Client:
Debug Mode: false
Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 3
Server Version: 19.03.12
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.3.0-53-generic
Operating System: Ubuntu 18.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.844GiB
Name: node
ID: XUWC:CGSD:ZIDK:OL6C:XHOO:MFRN:XLJE:57TX:ZSYA:KNSY:J6MF:V52X
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: XXXXXXXXXXXXXXXXXXXXXXXX
HTTPS Proxy: YYYYYYYYYYYYYYYYYYYYYYYYY
No Proxy: 127.0.0.1,localhost
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.)
node@node:~$ criu --version
Version: 3.14
node@node:~$ cat /etc/criu/runc.conf
tcp-established
node@node:~$ uname -a
Linux node 5.3.0-53-generic #47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
node@node:~$ ps aux | grep docker
root 1304 1.2 19.5 1727076 786488 ? Ssl 11:48 0:35 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --experimental
root 6457 0.0 0.0 479368 2392 ? Sl 12:29 0:00 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 1234 -container-ip 172.17.0.2 -container-port 1234
root 6466 0.0 0.1 109104 4924 ? Sl 12:30 0:00 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/384c2bee3c68462649efa7086d4db3469d5bed6dfb933df46710f547f99b926a -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
node@node:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:d9:7a:39 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute enp0s3
valid_lft 82002sec preferred_lft 82002sec
inet6 fe80::c55a:d51:8927:2d2/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:a9:d8:54 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.201/24 brd 192.168.1.255 scope global noprefixroute enp0s8
valid_lft forever preferred_lft forever
inet6 fe80::2584:598:35f3:a3b4/64 scope link noprefixroute
valid_lft forever preferred_lft forever
4: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:a8:51:c9 brd ff:ff:ff:ff:ff:ff
inet 192.168.2.201/24 brd 192.168.2.255 scope global noprefixroute enp0s9
valid_lft forever preferred_lft forever
inet6 fe80::da08:220:2abd:1bb2/64 scope link noprefixroute
valid_lft forever preferred_lft forever
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:2b:b0:13:88 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:2bff:feb0:1388/64 scope link
valid_lft forever preferred_lft forever
19: vethf587834@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
link/ether 42:77:9e:a6:c1:93 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::4077:9eff:fea6:c193/64 scope link
valid_lft forever preferred_lft forever
node@node:~$ docker network ls
NETWORK ID NAME DRIVER SCOPE
d4fa7d0a3c61 bridge bridge local
9473bedd0148 host host local
f6d26b2632da none null local
node@node:~$ docker network inspect d4fa7d0a3c61
[
{
"Name": "bridge",
"Id": "d4fa7d0a3c61361d03b2a014ca74702f4e01bc71e92686ba420471ba30d955a4",
"Created": "2020-07-06T11:48:22.573255955+09:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.17.0.0/16",
"Gateway": "172.17.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"384c2bee3c68462649efa7086d4db3469d5bed6dfb933df46710f547f99b926a": {
"Name": "echoserver",
"EndpointID": "4f57d8384ce43ef22c054ccff8c6bd6090562185dc937f4c5402809fa59652a0",
"MacAddress": "02:42:ac:11:00:02",
"IPv4Address": "172.17.0.2/16",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.bridge.default_bridge": "true",
"com.docker.network.bridge.enable_icc": "true",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
"com.docker.network.bridge.name": "docker0",
"com.docker.network.driver.mtu": "1500"
},
"Labels": {}
}
]
@YuichiroMaeyama Hi. I realize you've had this problem for a long time, but have you been able to solve it? And if so, how?