docker-selenium icon indicating copy to clipboard operation
docker-selenium copied to clipboard

[🐛 Bug]: Memory leak in selenium/standalone-chrome?

Open cvalerio opened this issue 2 years ago • 2 comments

What happened?

We are moving our instances of selenium/standalone-chrome from a CentOS host to an Ubuntu Server hosts. Both hosts have 32GB RAM, the new host has 16 vCores. The old host had to contain 20 instances, the new host 40.

Statistically the RAM would be enough because from our monitoring system we can see that the instances on the old host never used more that 12GB, and the new host is tasked of only host those 40 containers. All containers are set to 1GB maximum of RAM usage, for good measure.

After moving the instances, we had a system failure on the new host, caused by too much memory consumption, which started some investigation. Apparently all the containers were occupying much more that the 24GB (but I was prepared to see spikes of 30GB). We added other 32GB to the new host, but after monitoring for a while, we noticed that the instances (particularly the java processes from the instances) on the new server keep growing on RAM consumption even if they are doing nothing (all operation have been re-diverted on the old host).

In fact, while an idle instance on the old host will stabilize at ~200MB, on the new hosts it keeps growing, slowly but steadily. After a couple of hours from last reboot all instances are way above 800MB, and keep growing. Our concern is that an instance can hit the RAM limit in the middle of operations an being killed/restarted, causing current operation to fail.

We tried upgrading to latest version (4.3.0) but the issue is still present.

Here the notable differences from one host to another:

Working host:

>uname -a
Linux 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

>docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:58:10 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:56:35 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.8
  GitCommit:        7eba5930496d9bbe375fdf71603e610ad737d2b2
 runc:
  Version:          1.0.0
  GitCommit:        v1.0.0-0-g84113ee
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Faulty host:

>uname -a
Linux 5.15.0-41-generic #44-Ubuntu SMP Wed Jun 22 14:20:53 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

>docker version
Client: Docker Engine - Community
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:02:46 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:00:51 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.6
  GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc:
  Version:          1.1.2
  GitCommit:        v1.1.2-0-ga916309
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Dockerfile:

FROM selenium/standalone-chrome:4.1

WORKDIR /app

RUN sudo mkdir hosts
RUN sudo chown -R seluser:seluser hosts

RUN sudo mkdir screenshots
RUN sudo chown -R seluser:seluser screenshots

COPY ./selenium/start.sh start.sh

RUN sudo chmod +x /app/start.sh

HEALTHCHECK CMD curl --fail http://localhost:4444/ || exit

CMD ["/app/start.sh"]

/app/start.sh file:

echo $(hostname -I) > "/app/hosts/$(hostname).host"
/opt/bin/entry_point.sh

Appreciate any help in figuring out what's going on.

Command used to start Selenium Grid with Docker

version: '3.8'

services:

  selenium:
    build:
      context: [OMITTED]
      dockerfile: selenium/Dockerfile
    volumes:
      - /dev/shm:/dev/shm
      - ./hosts:/app/hosts
      - ./WorkerScreenshots:/app/screenshots
    scale: 40
    restart: always
    networks:
      mynet:
    deploy:
      resources:
        limits:
          cpus: '1.00'
          memory: 1G

Relevant log output

2022-07-15 12:45:01,008 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2022-07-15 12:45:01,010 INFO supervisord started with pid 14
2022-07-15 12:45:02,013 INFO spawned: 'xvfb' with pid 16
2022-07-15 12:45:02,017 INFO spawned: 'vnc' with pid 17
2022-07-15 12:45:02,020 INFO spawned: 'novnc' with pid 18
2022-07-15 12:45:02,037 INFO spawned: 'selenium-standalone' with pid 27
Setting up SE_NODE_GRID_URL...
2022-07-15 12:45:02,063 INFO success: xvfb entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2022-07-15 12:45:02,063 INFO success: vnc entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2022-07-15 12:45:02,063 INFO success: novnc entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2022-07-15 12:45:02,063 INFO success: selenium-standalone entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Selenium Grid Standalone configuration:
[network]
relax-checks = true

[node]
session-timeout = "300"
override-max-sessions = false
detect-drivers = false
drain-after-session-count = 0
max-sessions = 1

[[node.driver-configuration]]
display-name = "chrome"
stereotype = '{"browserName": "chrome", "browserVersion": "101.0", "platformName": "Linux"}'
max-sessions = 1

Starting Selenium Grid Standalone...
12:45:04.035 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
12:45:04.046 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
12:45:09.548 INFO [NodeOptions.getSessionFactories] - Detected 16 available processors
12:45:09.740 INFO [NodeOptions.report] - Adding chrome for {"browserVersion": "101.0","se:noVncPort": 7900,"browserName": "chrome","platformName": "Linux","se:vncEnabled": true} 1 times
12:45:09.840 INFO [Node.<init>] - Binding additional locator mechanisms: relative, id, name
12:45:09.970 INFO [GridModel.setAvailability] - Switching Node ed30b07f-2de7-4989-baa2-e23c7fb7553c (uri: http://172.23.0.13:4444) from DOWN to UP
12:45:09.971 INFO [LocalDistributor.add] - Added node ed30b07f-2de7-4989-baa2-e23c7fb7553c at http://172.23.0.13:4444. Health check every 120s
12:45:10.753 INFO [Standalone.execute] - Started Selenium Standalone 4.1.4 (revision 535d840ee2): http://172.23.0.13:4444

Operating System

Ubuntu

Docker Selenium version (tag)

4.1.4, 4.3.0

cvalerio avatar Jul 15 '22 12:07 cvalerio

Unfortunately i have encountered this problem basicly since i have started to use selenium. It seems like the browser instances are sometimes not closed properly by selenium causing them to hang in there as zombie processes. My ultimate solution to the problem was to actually programmatically restart the containers at steady intervals......

It is THE big problem with selenium.

Take a look this

MajesticOl avatar Aug 11 '22 12:08 MajesticOl

Apologies for the late reply. A couple of questions before digging deeper...

  • Have you seen the same with Firefox? There was a similar issue where slowness was reported and things improved after they upgraded to 4.4. These days the releases are also in sync with Chrome releases. So it might be an issue with the Chrome versions. Trying Firefox would be a comparison.
  • I see the container is limited to 1CPU. Is the Java process inside the container actually seeing only 1 CPU? I ask because we have an issue (or maybe it is a feature, not sure) with our HTTP client. The client we use starts a thread per CPU for performance reasons, and keeps the thread alive with some considerable memory allocation in it. So I would be interested to know if the Java process is actually only seeing one CPU.
  • How are you running tests? I see you are using the Standalone image. How are you directing tests to the right Standalone container?

diemol avatar Aug 30 '22 08:08 diemol

I will close this since we did not get more information.

diemol avatar Jan 11 '23 23:01 diemol

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Dec 09 '23 00:12 github-actions[bot]