docker-selenium
docker-selenium copied to clipboard
[🐛 Bug]: High CPU usage with 4.31 compared to 4.30
What happened?
I have tried to update our grid, from 4.29 to 4.31 at two separate occasions. Both times I see an abnormal high use of CPU in the kubernetes grid. Even when the nodes are idleing.
Going from an idle CPU usage on the nodes of 0.01 cpu eqvivalents, to 100% on 1 cpu per node (they are allowed 3 cpu each if they need, soo it does not max out).
Normal testrun cpu usage on the left in the image. At 07:10 i updated to 4.31 on three nodes.
Here are the CPU usage in table form.. Observe three nodes with 0.99 usage, and 2 with 0.01 (the latter is normal).
I can't see anything in any log that gives any hints on why this is happening.. I tried turning of Tracing (could not see anything in the traces anyway), but that did not do any changes..
I have tried to anonymize the setup files in this bugreport..
Command used to start Selenium Grid with Docker (or Kubernetes)
Example deployment file for one node, using both chrome and edge, and adding a mtlscert:
apiVersion: apps/v1
kind: Deployment
metadata:
name: selenium-magicnode-XXXXYYYY
labels:
app: selenium-magicnode-XXXXYYYY
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 0
selector:
matchLabels:
app: selenium-magicnode-XXXXYYYY
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: selenium-magicnode-XXXXYYYY
spec:
volumes:
- name: dshm
emptyDir:
medium: Memory
sizeLimit: 2Gi
## Video volume
- name: selenium-hub-volume
persistentVolumeClaim:
claimName: selenium-hub-pvc
## Custom certs volume
- name: cert-volume
secret:
secretName: XXXXYYYY
items:
- key: cert.p12
path: cert.p12
- key: password
path: password
containers:
- name: selenium-magicnode-XXXXYYYY
image: localImage
imagePullPolicy: Always
ports:
- containerPort: 5555
- containerPort: 5900
- containerPort: 4444
## Volume mounts
volumeMounts:
- mountPath: /dev/shm
name: dshm
## Video volume mount
- mountPath: "/videos"
name: selenium-hub-volume
## Custom certs volume mount
- name: cert-volume
mountPath: /opt/selenium/certs
readOnly: true
env:
## Custom certs setup
- name: SE_ENABLE_CUSTOM_CERTS
value: "true"
- name: CERT_TRUST_ATTR
value: "TCu,Cu,Tu"
## Event bus setup
- name: SE_EVENT_BUS_HOST
value: "URL"
- name: SE_EVENT_BUS_SUBSCRIBE_PORT
value: "4443"
- name: SE_EVENT_BUS_PUBLISH_PORT
value: "4442"
- name: SE_NODE_GRID_URL
value: "https://URL"
## Stereotype setup
- name: SE_NODE_STEREOTYPE
value: "[{\"browserName\": \"chrome\"}, {\"browserName\": \"MicrosoftEdge\"}]"
- name: SE_NODE_STEREOTYPE_EXTRA
value: "{\"platformName\": \"Linux\", \"character:name\": \"XXXXYYYY\", \"networkname:applicationName\":\"XXXXYYYY\", \"nodename:applicationName\":\"XXXXYYYY\"}"
- name: SE_NODE_NAME
value: "XXXXYYYY"
- name: SE_OTEL_SERVICE_NAME
value: "XXXXYYYY"
- name: SE_NODE_MAX_SESSIONS
value: "1"
- name: SE_NODE_OVERRIDE_MAX_SESSIONS
value: "false"
## General setup
- name: GENERATE_CONFIG
value: "true"
- name: TZ
value: "Europe/Stockholm"
- name: SE_ENABLE_MANAGED_DOWNLOADS
value: "true"
- name: SE_VNC_NO_PASSWORD
value: "1"
## Tracing setup
- name: SE_ENABLE_TRACING
value: "false"
- name: SE_OTEL_TRACES_EXPORTER
value: "otlp"
- name: SE_OTEL_EXPORTER_ENDPOINT
value: "http://selenium-hub-jaeger-svc:4317"
## Video setup
- name: SE_RECORD_VIDEO
value: "true"
- name: SE_VIDEO_FILE_NAME
value: "auto"
## Security context
securityContext:
runAsNonRoot: true
runAsUser: 1200
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
#seccompProfile:
# type: RuntimeDefault
## Probes
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 5555
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
startupProbe:
failureThreshold: 3
httpGet:
path: /status
port: 5555
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
livenessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 5555
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
## Resources (1 cpu per browser at least, and 1 gig of memory per browser)
resources:
limits:
memory: "4Gi"
cpu: "3"
requests:
memory: "100Mi"
cpu: "0.1"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: docker-credentials
restartPolicy: Always
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
Relevant log output
Installing certificates...
2025-04-15T07:08:22.970316001Z Found certificate file
2025-04-15T07:08:22.980665242Z Installing certificate in: /home/seluser/.pki/nssdb
pk12util: no nickname for cert in PKCS12 file.
2025-04-15T07:08:23.015961136Z pk12util: using nickname: XXXXYYYY
2025-04-15T07:08:23.015963188Z pk12util: PKCS12 IMPORT SUCCESSFUL
2025-04-15T07:08:23.021365918Z Virtual environment detected at /opt/venv, activating...
2025-04-15T07:08:23.025128350Z Python 3.12.3
2025-04-15T07:08:23.153845714Z 2025-04-15 09:08:23,153 INFO Included extra file "/etc/supervisor/conf.d/chrome-cleanup.conf" during parsing
2025-04-15T07:08:23.153896896Z 2025-04-15 09:08:23,153 INFO Included extra file "/etc/supervisor/conf.d/edge-cleanup.conf" during parsing
2025-04-15T07:08:23.153899548Z 2025-04-15 09:08:23,153 INFO Included extra file "/etc/supervisor/conf.d/firefox-cleanup.conf" during parsing
2025-04-15T07:08:23.153901289Z 2025-04-15 09:08:23,153 INFO Included extra file "/etc/supervisor/conf.d/recorder.conf" during parsing
2025-04-15 09:08:23,153 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2025-04-15T07:08:23.153933914Z 2025-04-15 09:08:23,153 INFO Included extra file "/etc/supervisor/conf.d/uploader.conf" during parsing
2025-04-15 09:08:23,156 INFO RPC interface 'supervisor' initialized
2025-04-15T07:08:23.156974433Z 2025-04-15 09:08:23,156 INFO supervisord started with pid 15
2025-04-15T07:08:24.159579444Z 2025-04-15 09:08:24,159 INFO spawned: 'video-ready' with pid 16
2025-04-15T07:08:24.160791824Z 2025-04-15 09:08:24,160 INFO spawned: 'xvfb' with pid 17
2025-04-15T07:08:24.161872245Z 2025-04-15 09:08:24,161 INFO spawned: 'vnc' with pid 18
2025-04-15 09:08:24,162 INFO spawned: 'novnc' with pid 19
2025-04-15T07:08:24.164137266Z 2025-04-15 09:08:24,163 INFO spawned: 'video-recording' with pid 20
2025-04-15T07:08:24.165300122Z 2025-04-15 09:08:24,165 INFO spawned: 'selenium-node' with pid 21
2025-04-15T07:08:24.173657199Z 2025-04-15 09:08:24,173 INFO success: video-ready entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2025-04-15T07:08:24.173701086Z 2025-04-15 09:08:24,173 INFO success: video-recording entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2025-04-15T07:08:24.173705054Z 2025-04-15 09:08:24,173 INFO success: selenium-node entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2025-04-15T07:08:24.184559641Z 2025-04-15 07:08:24,183 [endpoint.checks] - Endpoint http://localhost:5555/status is not available - status code: 000
2025-04-15T07:08:24.186096829Z 2025-04-15 07:08:24,185 [video.recorder] - Video folder exists: /videos
2025-04-15 07:08:24,188 [video.recorder] - Waiting for the display localhost:99.0 is open
2025-04-15T07:08:24.635531632Z Appending Selenium option: --register-period 120
2025-04-15T07:08:24.635557410Z Appending Selenium option: --register-cycle 10
2025-04-15T07:08:24.635562310Z Appending Selenium option: --register-shutdown-on-failure
2025-04-15T07:08:24.635565577Z Appending Selenium option: --heartbeat-period 30
2025-04-15T07:08:24.635608173Z Appending Selenium option: --log-level INFO
2025-04-15T07:08:24.635638529Z Appending Selenium option: --http-logs false
2025-04-15T07:08:24.635709830Z Appending Selenium option: --structured-logs false
2025-04-15T07:08:24.635806723Z Generating Selenium Config
2025-04-15T07:08:24.637985153Z Configuring server...
2025-04-15T07:08:24.637995901Z Setting up SE_NODE_HOST...
Setting up SE_NODE_PORT...
2025-04-15T07:08:24.719084240Z Appending Selenium option: --tracing false
2025-04-15T07:08:24.719124024Z Tracing is disabled
2025-04-15T07:08:24.719135278Z Appending Selenium option: --bind-host false
2025-04-15T07:08:24.719181178Z Appending Selenium option: --config /opt/selenium/config.toml
2025-04-15T07:08:24.719195717Z Selenium Grid Node configuration:
2025-04-15T07:08:24.719952064Z [events]
2025-04-15T07:08:24.719961462Z publish = "tcp://URL:4442"
subscribe = "tcp://URL:4443"
2025-04-15T07:08:24.719965953Z
2025-04-15T07:08:24.719967871Z [node]
2025-04-15T07:08:24.719970125Z grid-url = "https://URL"
2025-04-15T07:08:24.719972171Z session-timeout = 300
2025-04-15T07:08:24.719974326Z override-max-sessions = false
2025-04-15T07:08:24.719976111Z detect-drivers = false
2025-04-15T07:08:24.719978002Z drain-after-session-count = 0
max-sessions = 1
2025-04-15T07:08:24.719981495Z
2025-04-15T07:08:24.719983397Z [[node.driver-configuration]]
2025-04-15T07:08:24.719985216Z display-name = "chrome"
2025-04-15T07:08:24.719988649Z stereotype = '{"browserName": "chrome", "platformName": "Linux", "character:name": "XXXXYYYY", "networkname:applicationName": "XXXXYYYY", "nodename:applicationName": "XXXXYYYY"}'
max-sessions = 1
2025-04-15T07:08:24.720012290Z
2025-04-15T07:08:24.720014083Z [[node.driver-configuration]]
2025-04-15T07:08:24.720016377Z display-name = "MicrosoftEdge"
2025-04-15T07:08:24.720019395Z stereotype = '{"browserName": "MicrosoftEdge", "platformName": "Linux", "character:name": "XXXXYYYY", "networkname:applicationName": "XXXXYYYY", "nodename:applicationName": "XXXXYYYY"}'
2025-04-15T07:08:24.720021222Z max-sessions = 1
2025-04-15T07:08:24.720022874Z
2025-04-15T07:08:24.720197219Z Starting Selenium Grid Node...
2025-04-15T07:08:24.720326612Z Using JAVA_OPTS: -Dwebdriver.remote.enableTracing=false -Dwebdriver.httpclient.version=HTTP_1_1
2025-04-15T07:08:25.197832226Z 2025-04-15 07:08:25,197 [video.recorder] - Display localhost:99.0 is open with dimensions 1920x1080
2025-04-15 09:08:25,197 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-04-15T07:08:25.198206730Z 2025-04-15 09:08:25,198 INFO success: vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-04-15T07:08:25.198247735Z 2025-04-15 09:08:25,198 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-04-15T07:08:25.199167220Z 2025-04-15 07:08:25,198 [video.recorder] - Waiting for Node endpoint responds
2025-04-15T07:08:25.200223115Z 09:08:25.195 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
2025-04-15T07:08:25.204500538Z 09:08:25.204 INFO [LoggingOptions.getTracer] - Using null tracer
2025-04-15T07:08:25.263847842Z 09:08:25.263 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://url:4442 and tcp://url:4443
2025-04-15T07:08:25.344603556Z 09:08:25.344 INFO [UnboundZmqEventBus.<init>] - Sockets created
09:08:26.346 INFO [UnboundZmqEventBus.<init>] - Event bus ready
09:08:26.492 INFO [NodeServer.createHandlers] - Reporting self as: http://10.57.2.117:5555
09:08:26.495 INFO [LoggingOptions.getTracer] - Using null tracer
09:08:26.508 INFO [NodeOptions.getSessionFactories] - Detected 3 available processors
09:08:26.547 INFO [NodeOptions.report] - Adding MicrosoftEdge for {"browserName": "MicrosoftEdge","character:name": "XXXXYYYY","networkname:applicationName": "XXXXYYYY","nodename:applicationName": "XXXXYYYY","platformName": "linux","se:noVncPort": 7900,"se:vncEnabled": true} 1 times
09:08:26.548 INFO [NodeOptions.report] - Adding chrome for {"browserName": "chrome","character:name": "XXXXYYYY","networkname:applicationName": "XXXXYYYY","nodename:applicationName": "XXXXYYYY","platformName": "linux","se:noVncPort": 7900,"se:vncEnabled": true} 1 times
09:08:26.580 INFO [Node.<init>] - Binding additional locator mechanisms: relative
09:08:26.714 INFO [NodeServer$2.start] - Starting registration process for Node http://10.57.2.117:5555
09:08:26.715 INFO [NodeServer.execute] - Started Selenium node 4.31.0 (revision 4ae8fc9f8a): http://10.57.2.117:5555
09:08:26.736 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
09:08:26.825 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
2025-04-15 07:08:27,218 [video.recorder] - Node endpoint is responding now. Proceeding next steps...
Operating System
Kubernetes / Ranger / Argo
Docker Selenium version (image tag)
4.31
Selenium Grid chart version (chart version)
@parholmdahl, thank you for creating this issue. We will troubleshoot it as soon as we can.
Info for maintainers
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template label.
If the issue is a question, add the I-question label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-* label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer label.
Thank you!
In Slack channel we also get same feedback. In this case, can you try to execute into container, and execute command top to see which process consume much resources?
Yes!
This is from one that is acting normal (using 4.30):
And this is from one that is high CPU usage (using 4.31):
And what does this tell us? That it is not docker related i guess?
To classify the issue from Grid core or browser/driver version. You can setup few Node with image tag using Grid 4.31.0 with browser/driver Chrome/Edge v134. You can checkout image tags available https://github.com/SeleniumHQ/docker-selenium/blob/trunk/CHANGELOG/4.31.0/edge_134.md, https://github.com/SeleniumHQ/docker-selenium/blob/trunk/CHANGELOG/4.31.0/chrome_134.md
Sorry, I have been in the middle of a release, so I have not had the time to test.. But today i tested with the nightly build (guessing this would be eqvivalent with the non built 4.32 release?)
This is what I see in "top" inside the container:
So the problem i still present.
I will try to build with another Chrome version..
I tried deploying 4.32 with "our old way" (that is first build an image that contains chrome, edge and firefox, and then add mTLS certificate to that images in another build).. Same issue with around 10-12% cpu on that java process..
I then thought to skip the last buildstep, so no mTLS adding.. Same issue..
As a last resort i did a "vanilla take" with the following deploycode:
`apiVersion: apps/v1
kind: Deployment
metadata:
name: selenium-vanillanode-invanare
labels:
app: selenium-vanillanode-invanare
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 0
selector:
matchLabels:
app: selenium-vanillanode-invanare
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: selenium-vanillanode-invanare
spec:
volumes:
- name: dshm
emptyDir:
medium: Memory
sizeLimit: 2Gi
## Video volume
- name: selenium-hub-volume
persistentVolumeClaim:
claimName: selenium-hub-pvc
containers:
- name: selenium-vanillanode-invanare
image: selenium/node-chrome:134.0.6998.165-chromedriver-134.0.6998.165-grid-4.31.0-20250414
imagePullPolicy: Always
ports:
- containerPort: 5555
- containerPort: 5900
- containerPort: 4444
## Volume mounts
volumeMounts:
- mountPath: /dev/shm
name: dshm
## Video volume mount
- mountPath: "/videos"
name: selenium-hub-volume
env:
## Event bus setup
- name: SE_EVENT_BUS_HOST
value: "<OUR INTERNAL HOST FOR HUB>"
- name: SE_EVENT_BUS_SUBSCRIBE_PORT
value: "4443"
- name: SE_EVENT_BUS_PUBLISH_PORT
value: "4442"
- name: SE_NODE_GRID_URL
value: "<OUR INTERNAL HOST FOR HUB>"
## Stereotype setup
- name: SE_NODE_STEREOTYPE
value: "{\"browserName\": \"chrome\"}"
- name: SE_NODE_STEREOTYPE_EXTRA
value: "{\"platformName\": \"Linux\", \"character:name\": \"invanare\", \"networkname:applicationName\":\"invanare\", \"nodename:applicationName\":\"invanare\"}"
- name: SE_NODE_NAME
value: "vanillanode-invanare"
- name: SE_OTEL_SERVICE_NAME
value: "vanillanode-invanare"
- name: SE_NODE_MAX_SESSIONS
value: "1"
- name: SE_NODE_OVERRIDE_MAX_SESSIONS
value: "false"
## General setup
- name: GENERATE_CONFIG
value: "true"
- name: TZ
value: "Europe/Stockholm"
- name: SE_ENABLE_MANAGED_DOWNLOADS
value: "true"
- name: SE_VNC_NO_PASSWORD
value: "1"
## Tracing setup
- name: SE_ENABLE_TRACING
value: "false"
- name: SE_OTEL_TRACES_EXPORTER
value: "otlp"
- name: SE_OTEL_EXPORTER_ENDPOINT
value: "<OUR INTERNAL HOST FOR JAEGER>"
## Video setup
- name: SE_RECORD_VIDEO
value: "true"
- name: SE_VIDEO_FILE_NAME
value: "auto"
## Security context
securityContext:
runAsNonRoot: true
runAsUser: 1200
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
#seccompProfile:
# type: RuntimeDefault
## Probes
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 5555
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
startupProbe:
failureThreshold: 3
httpGet:
path: /status
port: 5555
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
livenessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 5555
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
## Resources (1 cpu per browser at least, and 1 gig of memory per browser)
resources:
limits:
memory: "4Gi"
cpu: "3"
requests:
memory: "100Mi"
cpu: "0.1"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: docker-credentials
restartPolicy: Always
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30 `
But the problem persists.. Still 10-12% cpu when using "top" command:
One thing that i have observed is that the new problematic builds seem to find an environment under /opt/venv and enters that environment at the very start of the pod..
Don't know if this has anything at all to do with the problem..
Tested latest nightly build today, to see if the problem persists, and unfortunately it does..
I also tested to upgrade the hub to 4.32 and the hub has no issues with high CPU. Then i tried to run the lates nightly build (as vanilla, no additions or updates) but still high CPU from that node..
Today i tried running the latest nightly build again, with no additions (straight from hub.docker.com), but i added SE_LOG_LEVEL to environment, and set it to DEBUG.. Then i got a crashloop with this log:
Virtual environment detected at /opt/venv, activating... Python 3.12.3 2025-05-26 18:01:16,609 INFO Included extra file "/etc/supervisor/conf.d/chrome-cleanup.conf" during parsing 2025-05-26T16:01:16.609596037Z 2025-05-26 18:01:16,609 INFO Included extra file "/etc/supervisor/conf.d/recorder.conf" during parsing 2025-05-26T16:01:16.609602595Z 2025-05-26 18:01:16,609 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing 2025-05-26T16:01:16.609668239Z 2025-05-26 18:01:16,609 INFO Included extra file "/etc/supervisor/conf.d/uploader.conf" during parsing 2025-05-26T16:01:16.612496099Z 2025-05-26 18:01:16,612 INFO RPC interface 'supervisor' initialized 2025-05-26T16:01:16.612777477Z 2025-05-26 18:01:16,612 INFO supervisord started with pid 9 2025-05-26T16:01:17.615266940Z 2025-05-26 18:01:17,614 INFO spawned: 'video-ready' with pid 10 2025-05-26T16:01:17.617358486Z 2025-05-26 18:01:17,617 INFO spawned: 'xvfb' with pid 11 2025-05-26T16:01:17.618946686Z 2025-05-26 18:01:17,618 INFO spawned: 'vnc' with pid 12 2025-05-26T16:01:17.620598526Z 2025-05-26 18:01:17,620 INFO spawned: 'novnc' with pid 13 2025-05-26T16:01:17.622225852Z 2025-05-26 18:01:17,622 INFO spawned: 'video-recording' with pid 14 2025-05-26T16:01:17.624020341Z 2025-05-26 18:01:17,623 INFO spawned: 'selenium-node' with pid 15 2025-05-26T16:01:17.634112111Z 2025-05-26 18:01:17,634 INFO success: video-ready entered RUNNING state, process has stayed up for > than 0 seconds (startsecs) 2025-05-26 18:01:17,634 INFO success: video-recording entered RUNNING state, process has stayed up for > than 0 seconds (startsecs) 2025-05-26T16:01:17.634405737Z 2025-05-26 18:01:17,634 INFO success: selenium-node entered RUNNING state, process has stayed up for > than 0 seconds (startsecs) 2025-05-26T16:01:17.635227439Z 2025-05-26 16:01:17,634 [video.recorder] - Video folder exists: /videos 2025-05-26T16:01:17.638692975Z 2025-05-26 16:01:17,638 [video.recorder] - Waiting for the display localhost:99.0 is open Appending Selenium option: --register-period 120 Appending Selenium option: --register-cycle 10 Appending Selenium option: --register-shutdown-on-failure Appending Selenium option: --heartbeat-period 30 Appending Selenium option: --log-level DEBUG Appending Selenium option: --http-logs false Appending Selenium option: --structured-logs false Generating Selenium Config Configuring server... Setting up SE_NODE_HOST... Setting up SE_NODE_PORT... Merging SE_NODE_STEREOTYPE_EXTRA={"platformName": "Linux", "character:name": "invanare", "networkname:applicationName":"invanare", "nodename:applicationName":"invanare"} to main stereotype Merged stereotype: {"browserName":"chrome","platformName":"Linux","character:name":"invanare","networkname:applicationName":"invanare","nodename:applicationName":"invanare"} Appending Selenium option: --tracing false 2025-05-26T16:01:18.033574727Z Tracing is disabled Appending Selenium option: --bind-host false Appending Selenium option: --config /opt/selenium/config.toml 2025-05-26T16:01:18.034349475Z Selenium Grid Node configuration: [events] 2025-05-26T16:01:18.035372086Z publish = "tcp://selenium-hub-svc.selenium-hub.svc.cluster.local:4442" 2025-05-26T16:01:18.035399616Z subscribe = "tcp://selenium-hub-svc.selenium-hub.svc.cluster.local:4443" 2025-05-26T16:01:18.035406356Z [node] 2025-05-26T16:01:18.035417330Z grid-url = "https://selenium-hub.tnmt.nordicmedtest.se" 2025-05-26T16:01:18.035422827Z session-timeout = 300 2025-05-26T16:01:18.035428297Z override-max-sessions = false 2025-05-26T16:01:18.035433379Z detect-drivers = false 2025-05-26T16:01:18.035439058Z drain-after-session-count = 0 max-sessions = 1 2025-05-26T16:01:18.035449606Z 2025-05-26T16:01:18.035455121Z [[node.driver-configuration]] display-name = "chrome" 2025-05-26T16:01:18.035468582Z stereotype = '{"browserName":"chrome","platformName":"Linux","character:name":"invanare","networkname:applicationName":"invanare","nodename:applicationName":"invanare"}' max-sessions = 1 2025-05-26T16:01:18.035479320Z Starting Selenium Grid Node... Using JAVA_OPTS: -Dwebdriver.remote.enableTracing=false -Dwebdriver.httpclient.version=HTTP_1_1 2025-05-26 18:01:18,431 WARN exited: selenium-node (exit status 1; not expected) 2025-05-26T16:01:18.431276931Z 2025-05-26 18:01:18,431 WARN received SIGINT indicating exit request 2025-05-26T16:01:18.431373533Z 2025-05-26 18:01:18,431 INFO waiting for video-ready, xvfb, vnc, novnc, video-recording to die 2025-05-26 16:01:18,647 [video.recorder] - Display localhost:99.0 is open with dimensions 1920x1080 2025-05-26T16:01:18.647711711Z 2025-05-26 18:01:18,647 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2025-05-26T16:01:18.647721684Z 2025-05-26 18:01:18,647 INFO success: vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2025-05-26T16:01:18.647759318Z 2025-05-26 18:01:18,647 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2025-05-26 16:01:18,649 [video.recorder] - Waiting for Node endpoint responds 2025-05-26 16:01:18,654 [video.recorder] - Trapped SIGTERM/SIGINT/x so shutting down recorder 2025-05-26 18:01:18,661 WARN received SIGTERM indicating exit request 2025-05-26 16:01:18,662 [video.recorder] - Ready to shutdown the recorder 2025-05-26 16:01:18,663 [video.recorder] - Trapped SIGTERM/SIGINT/x so shutting down recorder 2025-05-26 16:01:18,669 [video.recorder] - Ready to shutdown the recorder 2025-05-26 18:01:18,669 WARN received SIGTERM indicating exit request 2025-05-26 18:01:18,669 INFO stopped: video-recording (exit status 0) 2025-05-26 18:01:19,172 WARN stopped: novnc (terminated by SIGTERM) 2025-05-26 18:01:19,173 WARN stopped: vnc (terminated by SIGTERM) 2025-05-26 18:01:20,175 WARN stopped: xvfb (terminated by SIGTERM) 2025-05-26 18:01:20,176 WARN stopped: video-ready (terminated by SIGKILL)
Actually, DEBUG is not a support value, the right values should be FINE, FINER, FINEST After this https://github.com/SeleniumHQ/selenium/issues/15790, we will have a better hint for the user to acknowledge.
Oh! My bad.. I will try FINEST then..
Tried FINEST now, and it just floooooded the logs (remember, the node is not up for users to use.. It should be "silent"?)...
selenium-vanillanode-invanare-f86f58844-9n72r_selenium-vanillanode-invanare.log
It looks like the /status endpoint gets a new GET request like 2000 times a second? Seems a bit rough?
If 2000 times a second, in this case, how many Nodes are connecting to the Hub at that time?
- If /status endpoint of Node is hitting a lot, it might come from video recorder container, container health check, heartbeat interval (Hub audit Node status)
- If /status endpoint of Hub is hitting a lot, it might container health check, or external traffic fetches Hub status for a some tasks
This is on a Node (with image selenium/node-chrome:nightly).. If i change to a node of 4.30 version (selenium/node-chrome:4.30) I don't have the elevated CPU usage, and don't see this many logs in FINEST level.. Same setup, only change image...
I asked Claude AI to compare the two logfiles, and this is their conclusion:
We've identified a significant performance issue with the latest nightly build of selenium/node-chrome when using SE_LOG_LEVEL: "FINEST". The nightly version produces approximately 47x more log output compared to version 4.30, causing severe resource consumption and container crashes. Environment Kubernetes deployment with health checks configured Health check frequency: Every 10 seconds via /status endpoint Additional monitoring: ~1 request per second to /status endpoint Log level: FINEST Comparison Results Version 4.30 (selenium/node-chrome:4.30): Runtime: 46 seconds before termination Total log lines: 1,278 Log frequency: ~28 lines per second Behavior: Stable operation, normal termination Nightly version (selenium/node-chrome:nightly): Runtime: 13 seconds before termination Total log lines: 16,963 Log frequency: ~1,305 lines per second Behavior: Excessive CPU usage (~10% constant), early container crash Analysis The nightly version appears to have either: Significantly more verbose FINEST logging implementation A logging bug causing excessive output Both issues combined Each /status HTTP request in the nightly version generates substantially more debug output compared to version 4.30, creating a feedback loop where the logging itself becomes a resource burden. Impact Container crashes due to resource exhaustion High CPU consumption even when idle Unusable performance with FINEST log level 47x increase in log volume compared to stable version
I think that this is why I see that high CPU usage in later versions (I still run all our nodes on 4.30 because of this)...
I am shooting from the hip here, but could it have something to do with this change: https://github.com/SeleniumHQ/selenium/pull/15448 ?
I have not really checked the code changes (I SUCK at JAVA).. But it seems to handle status checks for more values than before?
That PR tried to add one more attributeregistered, to the status response.
But probably it might cause the hitting, since some logic tries to check node.getAvailability() == UP. I will investigate this point.
I tried to use TCP probe only from kubernetes: ## Probes startupProbe: tcpSocket: port: 5555 periodSeconds: 10
And got rid of the other probes.. But i still got high CPU, and when AI controlled the log, it says that it has approx 295 logs per second to the /status endpoint (I am guessing theese come from the hub then?).. compared to 4.30 version that has about 1 per second..
This smells fishy! :)
@VietND96 @parholmdahl After taking a look at the logs that you've provided and reading the whole thread I want to chime in here.
So I think all this polling comes from the video feature that is activated and there are a couple of things that point into that direction in the logs:
- The requests seem to come from curl which makes me thing it comes directly from some kind of script instead of the hub user-agent. Java status request:
GET /status HTTP/1.1
Host: 10.57.3.44:5555
user-agent: selenium/4.32.0 (java unix)
'Spammy ' status request:
GET /status HTTP/1.1
Host: localhost:5555
User-Agent: curl/8.11.0
Accept: */*
- The 'spammy' status requests seem to start after the following log line
Node endpoint is responding now. Proceeding next steps..., even though you have an initial status request before that.
After doing a quick search through the project I see that the log line highlighted on the second point comes from this script (https://github.com/SeleniumHQ/docker-selenium/blob/bb903724ce9046dfdd7907d2b480237d39ab1587/Video/video.sh). I suck at bash but my guess is that something fishy is happening on that script at line 252 onwards, especially the curl request that is sent there.
You can confirm this by either deactivating the video feature:
- name: SE_RECORD_VIDEO
value: "false"
---or more of a longshot on my end, getting rid of the double quotes around the session_id on line 253 since it might be doing some weird string interpolation.
session_id=$(jq -r "${JQ_SESSION_ID_QUERY}" "/tmp/status.json")---
PS: Disregard last part since I don't think that's the actual issue.
@Oxilod, thank you for your detailed analysis. Actually, one limitation of video recording is relying on /status to trigger start/stop recording per session, and it fetches the status very frequently
poll_interval=${SE_VIDEO_POLL_INTERVAL:-1}
If your test in a session took a few minutes, you can increase this poll interval via this env var SE_VIDEO_POLL_INTERVAL to 10 or 20 (unit in seconds) to see any improvement.
But correct me if I am wrong. You had the video solution already in 4.30 already (I am not having standalone video containers). So this must be a combination of factors, where the new variables on /status is the culprit.
You're right, increasing the poll interval of the video recorder will not be the final resolution for this case, it would be a workaround to evaluate how it helps to improve at this time before having the next release with a possible fix in Grid core.
So, i tried theese last workarounds.
Setting "SE_RECORD_VIDEO" to false, had a good impact on the CPU load (back to 0-0.1 in idle)
Resetting "SE_RECORD_VIDEO" to true, and changing "SE_VIDEO_POLL_INTERVAL" to 20, did not change the CPU load. Then we were back at at least 10% CPU on idle.
This was with latest nightly in "vanilla" mode. "selenium/node-chrome:nightly"
Ah ok, so video recording script do something consumes much CPU, let me try to deliver the improvement soon!
Please check image tag 20250606. I will continue improve this in next Grid core release
I deployed the version that you recommended, and it looks right now that the problem is gone.. I let this run throught the day, and see if it persists.
I had no more issues after this, i guess we see this fix in 4.34?
I am closing this with resolution in #2856
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.