docker-selenium
docker-selenium copied to clipboard
[🐛 Bug]: Health check making nodes unavailable in Kubernetes, getting error - "Could not start a new session. Response code 500. Message"
What happened?
This behaviour is only observed when ran on Kubernetes using (deployment.yaml) and not observed when ran with images on normal linux servers.
We have observed that the health check for Selenium grid - starting from v4.29 and onwards is making the nodes unavailable during the health check causing error while running test "org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Response code 500. Message: Could not start a new session. java.net.ConnectException" and "java.lang.NullPointerException: Cannot invoke "org.openqa.selenium.TakesScreenshot.getScreenshotAs(org.openqa.selenium.OutputType)" because "driver" is null"
Command used to start Selenium Grid with Docker (or Kubernetes)
kubectl apply -f hub.yaml
Relevant log output
14:49:45.487 INFO [LocalDistributor.add] - Added node b68b3b27-0e5c-40c4-82af-59dd1ecc4212 at https://node-chrome:5151. Health check every 120s
14:50:17.284 INFO [GridModel.setAvailability] - Switching Node b70e99a7-5d2c-45ec-b261-9f1d87d79744 (uri: https://node-chrome:5454) from UP to DOWN
14:52:17.265 INFO [GridModel.setAvailability] - Switching Node 7570b122-08d9-43f4-aff8-3276defc251d (uri: https://node-chrome:5252) from UP to DOWN
Operating System
Linux
Docker Selenium version (image tag)
4.31.0-20250414
Selenium Grid chart version (chart version)
No response
@ketanb02, thank you for creating this issue. We will troubleshoot it as soon as we can.
Info for maintainers
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template label.
If the issue is a question, add the I-question label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-* label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer label.
Thank you!
Hi, I think you should enable SE_LOG_LEVEL=FINE in both Hub and Node to see few more debug logs behind the health checks.
Hello, No logs related to health checks generated even after adding --log-level FINE. Giving below a ui console screenshot, just for reference on how the node are getting grayed out in between.
Logs with FINE at hub end. Nothing at node end.
10:59:58.971 DEBUG [HttpTracing.inject] - Injecting (GET) /status into org.openqa.selenium.remote.tracing.empty.NullContext@31a66910 at org.openqa.selenium.grid.node.remote.RemoteNode:252 10:59:58.972 DEBUG [JdkHttpClient.execute0] - Executing request: (GET) /status 10:59:58.980 DEBUG [JdkHttpClient.execute0] - Ending request (GET) /status in 7ms 10:59:58.980 DEBUG [LocalDistributor.updateNodeAvailability] - Health check result for https://node-chrome:5252 was DOWN 10:59:58.980 INFO [GridModel.setAvailability] - Switching Node 7570b122-08d9-43f4-aff8-3276defc251d (uri: https://node-chrome:5252) from UP to DOWN
This is happening only on Kubernetes setup and on 4.29 onwards... till 4.28.1 it was working fine.
Just added a comparison screenshot for v4.28 and later where we are facing issue
Hello Team, Any advise on this issue ? Thank you!
I saw logs
14:49:45.487 INFO [LocalDistributor.add] - Added node b68b3b27-0e5c-40c4-82af-59dd1ecc4212 at https://node-chrome:5151. Health check every 120s
14:50:17.284 INFO [GridModel.setAvailability] - Switching Node b70e99a7-5d2c-45ec-b261-9f1d87d79744 (uri: https://node-chrome:5454) from UP to DOWN
14:52:17.265 INFO [GridModel.setAvailability] - Switching Node 7570b122-08d9-43f4-aff8-3276defc251d (uri: https://node-chrome:5252) from UP to DOWN
Are you using service name for Node connection from Hub? Can you share the YAML files that you used to deploy Hub-Nodes?
Sorry for the delay
Hub svc yaml
apiVersion: v1 kind: Service metadata: name: selenium-k8s-hub namespace: production labels: app: selenium-k8s-hub spec: ports: - port: 4444 targetPort: 4444 name: port0 - port: 4443 targetPort: 4443 name: port1 - port: 4442 targetPort: 4442 name: port2 selector: app: selenium-k8s-hub
Hub deployment yaml
apiVersion: apps/v1 kind: Deployment metadata: name: selenium-k8s-hub namespace: production labels: app: selenium-k8s-hub spec: replicas: 1 selector: matchLabels: app: selenium-k8s-hub template: metadata: labels: app: selenium-k8s-hub spec: containers: - name: selenium-k8s-hub image: selenium/hub:4.31.0-20250414 shm_size: '4gb' privileged: true ports: - containerPort: 4444 - containerPort: 4443 - containerPort: 4442 livenessProbe: httpGet: scheme: HTTPS path: /status port: 4444 initialDelaySeconds: 30 timeoutSeconds: 5 readinessProbe: httpGet: scheme: HTTPS path: /status port: 4444 initialDelaySeconds: 30 timeoutSeconds: 5 env: - name: SE_OPTS value: '--https-certificate /home/seluser/server.pem --https-private-key /home/seluser/server.key' - name: SE_ENABLE_TRACING value: 'false' - name: SE_HUB_HOST value: 'selenium-k8s-hub' - name: SE_SESSION_REQUEST_TIMEOUT value: '900' - name: SE_SESSION_RETRY_INTERVAL value: '10' volumeMounts: - mountPath: /home/seluser name: vol-map volumes: - name: vol-map hostPath: path: /opt/grid/certs type: Directory
Node svc yaml apiVersion: v1 kind: Service metadata: name: node-chrome labels: name: node-chrome spec: selector: app: node-chrome ports: - name: nodeport protocol: TCP port: 5555 targetPort: 5555
Node deployment yaml
apiVersion: apps/v1 kind: Deployment metadata: name: node-chrome1 namespace: production labels: app: node-chrome spec: replicas: 1 selector: matchLabels: app: node-chrome template: metadata: labels: app: node-chrome spec: terminationGracePeriodSeconds: 3600 containers: - name: node-chrome image: selenium/node-chrome:4.31.0-20250414 privileged: true shm_size: 1gb env: - name: SE_OPTS value: "--https-certificate /home/seluser/server.pem --https-private-key /home/seluser/server.key" - name: SE_EVENT_BUS_HOST value: "selenium-k8s-hub" - name: SE_EVENT_BUS_PUBLISH_PORT value: '4442' - name: SE_EVENT_BUS_SUBSCRIBE_PORT value: '4443' - name: SE_NODE_SESSION_TIMEOUT value: '600' - name: SE_ENABLE_TRACING value: 'false' - name: SE_NODE_HOST value: 'node-chrome' - name: SE_NODE_PORT value: '5555' - name: SE_DRAIN_AFTER_SESSION_COUNT value: '100' ports: - containerPort: 5555 protocol: TCP volumeMounts: - name: dshm mountPath: /dev/shm - name: vol-map mountPath: /home/seluser resources: requests: cpu: "1" ephemeral-storage: 100Mi memory: 1Gi limits: memory: 1Gi ephemeral-storage: 150Mi cpu: "1" volumes: - name: dshm emptyDir: { "medium": "Memory" } - name: vol-map hostPath: path: /opt/grid/certs type: Directory
Hello Team,
We would greatly appreciate if you could kindly share any possible solutions or suggestions regarding the issue.
Thank you for your support.
Hello Team,
I just wanted to kindly check in and ask if there is any solution regarding the issue.
Please let me know if there's anything I can assist with or if further input is needed from my side.
Thank you so much for your time and support.
Hello Team,
Would you please kindly share any possible solutions or suggestions regarding the issue.
Thank you.
These 2 env vars
- name: SE_HUB_HOST
value: 'selenium-k8s-hub'
- name: SE_NODE_HOST
value: 'node-chrome'
Can you change it to
- name: SE_HUB_HOST
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: SE_NODE_HOST
valueFrom:
fieldRef:
fieldPath: status.podIP
Thank you for your response.
Will try and update output.