gvisor
gvisor copied to clipboard
Docker/gVisor fail to span thousand runsc containers
Dear gVisor developers,
I experience a problem when spanning ~1000 containers on the same host, each one running a simplistic TCP server inside. (up to ~400 containers) works just fine. The log is attached. The command to span each micro VM is the following:
docker run -dit --name alpine_0 --runtime=runsc -p HOST_PORT:GUEST_PORT alpine /path/to/my_tcp_server
There are a few error types there, incl. the ones reported by docker and by Go runtime. At least one of the error types originates from from fork() failure.
I wonder if anyone has any experience with booting that many microVMs on a single server, and could kindly share comments/recommendations on the setup? The idea is to mimic FaaS setting where hundreds and thousands of active functions are RPC-invoked and share the same physical server. Thank you in advance!
Regards, Dmitrii log.txt
Linux iccluster102 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Docker version 18.09.6, build 481bc77
runsc version release-20190304.1-185-gff8ed5e6a5a3
spec: 1.0.1-dev
For completion's sake, can you post the /etc/docker/daemon.json that 'you used?
Hi @ianlewis, I use the default one.
{
"runtimes": {
"runsc": {
"path": "/usr/local/bin/runsc"
}
}
The behavior can be reproduced with the following script
#!/bin/bash
THR_NUM=$1
NUM=$2
RUNTIME=$3
echo Running VM with runtime=$RUNTIME, thread_num=$THR_NUM, VM count=$NUM
sudo sysctl -w net.ipv4.ip_local_port_range="51000 65535"
sudo sysctl -w net.ipv4.conf.all.forwarding=1
# Avoid "neighbour: arp_cache: neighbor table overflow!"
sudo sysctl -w net.ipv4.neigh.default.gc_thresh1=1024
sudo sysctl -w net.ipv4.neigh.default.gc_thresh2=2048
sudo sysctl -w net.ipv4.neigh.default.gc_thresh3=4096
for ((i=0; i<NUM; i++)); do
port=$((33000 + i))
docker run -dit --rm --name alpine_${i} --runtime=${RUNTIME} -p $port:5201 \
ustiugov/alpine_gv /tmp/servers/linux_synthetic $THR_NUM 5201
done
echo Guests are ready!
executed with args: /path/to/script 1 1000 runsc
A friendly reminder that this issue had no activity for 120 days.
Seemed to be due to no space left on device errors.
A friendly reminder that this issue had no activity for 120 days.
This issue has been closed due to lack of activity.