gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

Docker/gVisor fail to span thousand runsc containers

Open ustiugov opened this issue 6 years ago • 6 comments

Dear gVisor developers,

I experience a problem when spanning ~1000 containers on the same host, each one running a simplistic TCP server inside. (up to ~400 containers) works just fine. The log is attached. The command to span each micro VM is the following:

docker run -dit --name alpine_0 --runtime=runsc -p HOST_PORT:GUEST_PORT alpine /path/to/my_tcp_server

There are a few error types there, incl. the ones reported by docker and by Go runtime. At least one of the error types originates from from fork() failure.

I wonder if anyone has any experience with booting that many microVMs on a single server, and could kindly share comments/recommendations on the setup? The idea is to mimic FaaS setting where hundreds and thousands of active functions are RPC-invoked and share the same physical server. Thank you in advance!

Regards, Dmitrii log.txt

Linux iccluster102 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Docker version 18.09.6, build 481bc77
runsc version release-20190304.1-185-gff8ed5e6a5a3
spec: 1.0.1-dev

ustiugov avatar May 13 '19 19:05 ustiugov

For completion's sake, can you post the /etc/docker/daemon.json that 'you used?

ianlewis avatar May 31 '19 08:05 ianlewis

Hi @ianlewis, I use the default one.

{
  "runtimes": {
    "runsc": {
      "path": "/usr/local/bin/runsc"
    }
}

ustiugov avatar Jun 03 '19 19:06 ustiugov

The behavior can be reproduced with the following script

#!/bin/bash

THR_NUM=$1
NUM=$2
RUNTIME=$3
echo Running VM with runtime=$RUNTIME, thread_num=$THR_NUM, VM count=$NUM

sudo sysctl -w net.ipv4.ip_local_port_range="51000 65535"
sudo sysctl -w net.ipv4.conf.all.forwarding=1
# Avoid "neighbour: arp_cache: neighbor table overflow!"
sudo sysctl -w net.ipv4.neigh.default.gc_thresh1=1024
sudo sysctl -w net.ipv4.neigh.default.gc_thresh2=2048
sudo sysctl -w net.ipv4.neigh.default.gc_thresh3=4096

for ((i=0; i<NUM; i++)); do
    port=$((33000 + i))
    docker run -dit --rm --name alpine_${i} --runtime=${RUNTIME} -p $port:5201 \
        ustiugov/alpine_gv /tmp/servers/linux_synthetic $THR_NUM 5201
done

echo Guests are ready!

executed with args: /path/to/script 1 1000 runsc

ustiugov avatar Jun 03 '19 19:06 ustiugov

A friendly reminder that this issue had no activity for 120 days.

github-actions[bot] avatar Sep 15 '23 00:09 github-actions[bot]

Seemed to be due to no space left on device errors.

ayushr2 avatar Sep 15 '23 18:09 ayushr2

A friendly reminder that this issue had no activity for 120 days.

github-actions[bot] avatar Jan 14 '24 00:01 github-actions[bot]

This issue has been closed due to lack of activity.

github-actions[bot] avatar Apr 14 '24 00:04 github-actions[bot]