sysbox EPERM during setrlimit() execution prevents apps from initializing

Hi, Great project!

I am having issues where for some containers the dockerd inside a container will become unresponsive, restarting the daemon once it has reached this state will print the following and then nothing more.

root@9af3015100f8:/opt# dockerd
INFO[2021-07-14T23:46:24.062904498Z] Starting up                                  
^C

The only solution I have found to get the daemon responsive again is to restart the container, but the issue pops up fairly frequently (i.e within 3 or 4 container launches).

I am experiencing this most recently in the golang image but have seen it in others. Not sure where to look for more information to help debug, sorry.

Jul 14 '21 23:07 Oliver-Sellwood

Thanks @Oliver-Sellwood!

We have fixed a few issues that could explain what you are observing. Have you built Sysbox from source or are you running the latest release (v0.3.0)? If you are running v0.3.0, please build Sysbox from source (very easy process as it is fully containerized) and try again.

Also, keep in mind that we are in the testing phase of the next Sysbox-CE release (v0.4.0) which is expected to come out within the next few hours / days.

Jul 14 '21 23:07 rodnymolina

Yea, I'm on 0.3.0, I don't have time to test from source right now. I will update when I've had a chance (or just install 0.4.0 if it is already out when I get around)

within the next few hours / days.

Nice, gratz!

Jul 15 '21 00:07 Oliver-Sellwood

@Oliver-Sellwood, just to let you know that Sysbox v0.4.0 is available now.

Jul 15 '21 06:07 rodnymolina

Thanks :)

Now I get these;

Starting es03 ... erroress_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown

ERROR: for es03  Cannot start service es03: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: procStarting es02 ... error

ERROR: for es02  Cannot start service es02: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown

ERROR: for es01  Cannot start service es01: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown

ERROR: for es03  Cannot start service es03: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown

ERROR: for es02  Cannot start service es02: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown
ERROR: Encountered errors while bringing up the project.`

and in the dockerd logs

INFO[2021-07-18T01:07:40.055740278Z] shim disconnected                             id=2c2d1a32e0edceca97c54b09980b9f6af699d73799d4610af9d2c4ca826b1841
ERRO[2021-07-18T01:07:40.055868856Z] copy shim log                                 error="read /proc/self/fd/14: file already closed"
ERRO[2021-07-18T01:07:40.058000286Z] stream copy error: reading from a closed fifo 
ERRO[2021-07-18T01:07:40.058056997Z] stream copy error: reading from a closed fifo 
INFO[2021-07-18T01:07:40.302366347Z] shim disconnected                             id=d14b01fa53a0a5882418cce7a4c804db945453dc10862d02f7df19a7ce332eb8
ERRO[2021-07-18T01:07:40.302423967Z] copy shim log                                 error="read /proc/self/fd/18: file already closed"
ERRO[2021-07-18T01:07:40.311180231Z] stream copy error: reading from a closed fifo 
ERRO[2021-07-18T01:07:40.311201742Z] stream copy error: reading from a closed fifo 
INFO[2021-07-18T01:07:40.324172533Z] shim disconnected                             id=b96fbd96eb5538ad1c3cb6a815d1b31a25b77cc234c345fdac37bee2cecbefd1
ERRO[2021-07-18T01:07:40.324311518Z] copy shim log                                 error="read /proc/self/fd/12: file already closed"
ERRO[2021-07-18T01:07:40.326996932Z] stream copy error: reading from a closed fifo 
ERRO[2021-07-18T01:07:40.327002240Z] stream copy error: reading from a closed fifo 
ERRO[2021-07-18T01:07:40.537804544Z] 2c2d1a32e0edceca97c54b09980b9f6af699d73799d4610af9d2c4ca826b1841 cleanup: failed to delete container from containerd: no such container 
ERRO[2021-07-18T01:07:40.537881021Z] Handler for POST /v1.41/containers/2c2d1a32e0edceca97c54b09980b9f6af699d73799d4610af9d2c4ca826b1841/start returned error: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown 
ERRO[2021-07-18T01:07:40.684948301Z] b96fbd96eb5538ad1c3cb6a815d1b31a25b77cc234c345fdac37bee2cecbefd1 cleanup: failed to delete container from containerd: no such container 
ERRO[2021-07-18T01:07:40.685018981Z] Handler for POST /v1.41/containers/b96fbd96eb5538ad1c3cb6a815d1b31a25b77cc234c345fdac37bee2cecbefd1/start returned error: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown 
ERRO[2021-07-18T01:07:40.789741377Z] d14b01fa53a0a5882418cce7a4c804db945453dc10862d02f7df19a7ce332eb8 cleanup: failed to delete container from containerd: no such container 
ERRO[2021-07-18T01:07:40.789815199Z] Handler for POST /v1.41/containers/d14b01fa53a0a5882418cce7a4c804db945453dc10862d02f7df19a7ce332eb8/start returned error: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown

I also tried with increased memlock limits, but still no luck. i.e

docker run -dit -v `pwd`/test:/opt --ulimit memlock=8192000000:8192000000 golang bash

Env setup (inside the container)

curl -fsSL https://get.docker.com |sh ## yolo
curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
cd /opt
/usr/local/bin/docker-compose up

and the docker-compose.yml is

version: '2.2'
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.13.3
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data01:/usr/share/elasticsearch/data
    ports:
      - 9201:9200
    networks:
      - elastic

  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.13.3
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data02:/usr/share/elasticsearch/data
    networks:
      - elastic

  es03:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.13.3
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data03:/usr/share/elasticsearch/data
    networks:
      - elastic

  kib01:
    image: docker.elastic.co/kibana/kibana:7.13.3
    container_name: kib01
    ports:
      - 5601:5601
    environment:
      ELASTICSEARCH_URL: http://es01:9200
      ELASTICSEARCH_HOSTS: '["http://es01:9200","http://es02:9200","http://es03:9200"]'
    networks:
      - elastic

volumes:
  data01:
    driver: local
  data02:
    driver: local
  data03:
    driver: local

networks:
  elastic:
    driver: bridge

Jul 18 '21 01:07 Oliver-Sellwood

@Oliver-Sellwood, this time you ran into something very different than before. I was able to reproduce the issue following your description, very helpful btw ...

The bad news is that we are dealing with two different issues. The good one is that i believe that I have potential fixes for both of them in my mind. Also, if you're willing to make a couple of minor adjustments, i have a simple workaround that you can apply (see further below).

We are dealing with two different issues here, and they are both a consequence of the special requirements imposed by Elastic-Search during initialization. As you can see below, during problem reproduction, Elastic complains of not enough resources being allocated for its operation:

es02     | ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch.
es02     | bootstrap check failure [1] of [2]: memory locking requested for elasticsearch process but memory is not locked
es02     | bootstrap check failure [2] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
es02     | ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/es-docker-cluster.log
es02     | {"type": "server", "timestamp": "2021-07-18T06:14:20,669Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "es-docker-cluster", "node.name": "es02", "message": "stopping ..." }
es01     | ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch.
es01     | bootstrap check failure [1] of [2]: memory locking requested for elasticsearch process but memory is not locked
es01     | bootstrap check failure [2] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
es01     | ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/es-docker-cluster.log
es01     | {"type": "server", "timestamp": "2021-07-18T06:14:20,690Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "es-docker-cluster", "node.name": "es01", "message": "stopping ..." }
es03     | ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch.
es03     | bootstrap check failure [1] of [2]: memory locking requested for elasticsearch process but memory is not locked
es03     | bootstrap check failure [2] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
es03     | ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/es-docker-cluster.log

The problem is that, as can be seen above, none the system-resource adjustments (limit increments) attempted during the Elastic initialization process succeeded. There are two separate issues that need resolution here:

We should allow root users within a sys-container to modify this node: /proc/sys/vm/max_map_count, which corresponds to the sysctl observed above (vm.max_map_count). Right now this resource is only readable, hence the error message above.
EPERM during setrlimit() execution: For this second issue I see two approaches that we could explore. We can either offer full flexibility by allowing the execution of setrlimit() syscall through seccomp-based trapping logic, OR, we offer a more constrained (but perhaps more secure) approach in which the user must define ahead of time the limits to be associated to all the processes within a sys-container (e.g. through a config-file) -- obviously, this limit would need to be large enough to satisfy all the app requirements within a sys-container.

Notice that currently only soft-limits can be modified within a sys-container; apparently, hard-limits cannot be changed outside the non-init user-namespace:

root@test-1:/# cat /proc/self/limits
Limit                     Soft Limit           Hard Limit           Units
...
Max pending signals       31614                31614                signals
...
root@test-1:/#

root@test-1:/# ulimit -a
...
pending signals                 (-i) 31614
...
root@test-1:/# ulimit -i 31615
bash: ulimit: pending signals: cannot modify limit: Operation not permitted

root@test-1:/# ulimit -i 31614
root@test-1:/#

For now, i would suggest to apply this simple workaround:

First of all, increase the vm.max_map_count sysctl at host level:

$ sudo sysctl -w vm.max_map_count=262144
vm.max_map_count = 262144

Secondly, to avoid the invocation of rlimit() syscalls within the system-container, ensure that this one is started with the following instruction. At this point you can also eliminate the 'ulimit' attributes present in the Elastic docker-compose file.

$ docker run --runtime=sysbox-runc -it -v `pwd`/test:/opt --ulimit memlock=8192000000:8192000000 golang bash

You should be able to bring up all the containers that are part of the docker-compose stack now.

Jul 18 '21 09:07 rodnymolina

I was having a similar issue using nestybox/gitlab-runner-docker Dockerfile when building locally with newer docker and gitlab-runner versions. Installing sysbox-ce 0.4.0 fixed my problem. Awesome job!

Jul 23 '21 00:07 Mateus-Romera

Will adjust the title of this issue to reflect the new problem discovered by @Oliver-Sellwood. The original issue (dockerd freezes) was already tracked and fixed a few weeks ago as part of various other issues.

Jul 23 '21 03:07 rodnymolina

Note: Problem was not caught by Sysbox's extended testsuites (where an Elastic-Search testcase is already present), due to the absence in that case of the -e "bootstrap.memory_lock=true" --ulimit memlock=-1:-1 attributes, which eventually end up causing the setrlimit() syscall.

Jul 23 '21 03:07 rodnymolina

I ran into a similar issue with the following docker-compose:

  dns01:
    <<: *service
    hostname: xyz
    container_name: xyz
    image: devenv:latest
    ulimits:
      nofile:
        soft: 3009550
        hard: 3009550
    networks:
      boulder:
        ipv4_address: 10.4.12.20

Error response from daemon: OCI runtime create failed: container_linux.go:393: starting container process caused: process_linux.go:607: container init caused: process_linux.go:465: setting rlimits for ready process caused: error setting rlimit type 7: operation not permitted: unknown

Jan 20 '22 23:01 aaomidi

I don't want to derail the conversation here, but for folks running into issues with Elasticsearch's insistence on memlock, you can disable this check by setting the environment variable bootstrap.memory_lock=false in your Elasticsearch container. You will want to make sure you have swap disabled.

Sep 20 '22 15:09 johnstcn

sysbox sysbox copied to clipboard

EPERM during setrlimit() execution prevents apps from initializing

sysbox
sysbox copied to clipboard