docker-selenium
docker-selenium copied to clipboard
[🐛 Bug]: Nodes stuck in DRAINING
What happened?
I'm using KEDA with drain session = 1.
After session execution node stuck and won't deprovision.
In the same time I see these message in grid:
2025-02-12 17:50:57,374 DEBUG [Probe.Liveness] - Session Queue Size: 60, Session Count: 0, Max Session: 0
2025-02-12 17:50:57,375 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
Command used to start Selenium Grid with Docker (or Kubernetes)
Helm chart 0.39.2
Relevant log output
16:31:38.418 INFO [NodeServer$2.lambda$start$1] - Sending registration event...
16:31:43.421 INFO [NodeServer$2.lambda$start$1] - Sending registration event...
16:31:46.218 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
16:31:48.445 INFO [LocalNode.newSession] - Session created by the Node. Id: d712a9e4ffc7956978a907188bd423cd, Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 133.0.6943.53, chrome: {chromedriverVersion: 133.0.6943.53 (9a80935019b0..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:40295}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: wss://selenium-grid.XXXX..., se:cdpVersion: 133.0.6943.53, se:containerName: selenium-grid-selenium-node..., se:noVncPort: 7900, se:vnc: wss://selenium-grid.appx.hu..., se:vncEnabled: true, se:vncLocalAddress: ws://10.1.118.22:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
16:31:48.447 INFO [LocalNode.checkSessionCount] - Draining Node, configured sessions value (1) has been reached.
2025-02-12 16:32:13,351 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:32:18,227 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:32:23,305 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:32:28,931 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:32:33,307 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:32:36,334 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:32:43,333 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:32:48,730 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:32:53,351 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:32:58,129 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:33:03,305 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:33:07,628 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:33:13,355 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:33:19,381 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:33:23,336 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:33:29,363 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:33:33,359 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:33:39,389 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:33:43,332 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:33:49,359 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:33:53,346 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:33:56,529 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:34:03,307 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:34:09,031 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:34:13,347 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:34:17,729 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:34:23,327 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:34:26,732 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:34:33,311 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:34:36,728 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:34:43,322 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:34:46,835 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:34:53,354 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:34:57,727 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:35:03,372 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:35:06,731 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:35:13,325 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:35:16,430 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:35:23,345 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:35:26,432 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:35:33,304 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:35:37,327 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:35:43,334 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:35:46,734 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:35:53,322 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:35:56,828 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:36:03,303 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:36:06,827 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:36:13,323 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:36:17,827 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:36:23,302 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:36:26,529 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:36:33,351 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:36:38,034 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:36:43,320 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:36:47,637 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:36:53,321 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:36:56,927 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:37:03,322 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:37:06,828 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:37:13,333 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:37:17,236 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:37:23,351 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:37:28,130 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:37:33,323 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:37:37,428 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:37:43,362 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:37:46,930 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:37:53,323 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:37:56,526 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:38:03,303 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:38:07,530 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:38:13,322 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:38:17,226 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:38:23,321 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:38:26,527 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:38:33,343 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:38:38,127 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:38:43,311 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:38:48,227 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:38:53,301 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:38:57,637 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:39:03,310 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
2025-02-12 16:39:06,336 [Probe.Liveness] - Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
2025-02-12 16:39:13,321 [Probe.Liveness] - Node responds the ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 with status: DRAINING
Operating System
EKS
Docker Selenium version (image tag)
4.28.1-20250202
Selenium Grid chart version (chart version)
0.39.2
@farioas, thank you for creating this issue. We will troubleshoot it as soon as we can.
Info for maintainers
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template label.
If the issue is a question, add the I-question label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-* label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer label.
Thank you!
Is this Deployment or Job?
It's Job.
I think this is an expected behavior. The message
16:31:48.447 INFO [LocalNode.checkSessionCount] - Draining Node, configured sessions value (1) has been reached.
to inform that max-sessions reached, and Node status switching from UP to DRAINING. The node will be drained once the current session is completed. I am not sure at this time, in Node was having any session ongoing. I will probably update the probe to print number of ongoing sessions together with Node status. On message
Node ID: bb6c2956-d79f-4727-99da-a96dd2ad0ba6 is not found in the Grid. Node is not ready.
Probably, Node could not reach the Hub status endpoint to double check the NodeId there. Can you take a Pod to printenv all container env vars for my checking further?
I already tried to fight with this strange behavior of Hub, when it kills itself because of:
It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
This worked for some amount of time:
hub:
livenessProbe:
failureThreshold: 50
After hub restart old nodes (that were provisioned, but haven't received a task) are not connected back to grid, but KEDA sees all the provisioned nodes and doing nothing - so we have overfilled queue and idling provisioned nodes.
Currently I was able to solve this behavior by switching to distributed mode with Redis as an external data store.
That is the liveness probe in Hub, it will be failed once queue is up but the number of sessions keep as 0 for a while until failureThreshold
As your description, I understand the issue is Hub get restarted. After restarted, existing Nodes could not register back to Hub, there is no Node attached in Hub, while having a mount of Node pods are running, which cause KEDA scaler get confused that already had enough node, then no more scale happening.
I also discovered a scenario where Hub get restarted (e.g it took over 1 minute to be ready fully). In the meantime, Node pods are scaled up and started registration, the register-period decides how long Node will try to register to Distributor, after that Node will not attempt to register again. In Helm chart, it Node startup probe is enabled, the probe period will be used for that register period (around 60s as I see). If Hub get back after this period, the Node pod will continue running there and do nothing. It leads to above scenario for autoscaling works incorrect.
- From Grid core, I am adding a flag to shutdown Node if registration failed, it will help Node container can restart and retry register, if it is restarted many times the pod status will reflect status CrashLoopBackOff - https://github.com/SeleniumHQ/selenium/pull/15297
Even with isolated components the distributor still restarted:
2025-02-17 03:10:22,642 DEBUG [Probe.Liveness] - Distributor is healthy.
2025-02-17 03:11:22,650 DEBUG [Probe.Liveness] - Distributor is healthy.
2025-02-17 03:12:22,661 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:12:22,662 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:13:22,661 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:13:22,663 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:14:22,661 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:14:22,662 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:15:22,680 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:15:22,681 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:16:22,669 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:16:22,670 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:17:22,680 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:17:22,681 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:18:22,653 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:18:22,654 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:19:22,660 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:19:22,661 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:20:22,650 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:20:22,651 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:21:22,694 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:21:22,695 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:22:22,682 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:22:22,683 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:23:22,668 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:23:22,669 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:24:22,685 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:24:22,686 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:25:22,671 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:25:22,672 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:26:22,712 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:26:22,713 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:27:22,650 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:27:22,651 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:28:22,684 DEBUG [Probe.Liveness] - Session Queue Size: 20, Session Count: 0, Max Session: 0
2025-02-17 03:28:22,685 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:29:22,674 DEBUG [Probe.Liveness] - Session Queue Size: 23, Session Count: 0, Max Session: 0
2025-02-17 03:29:22,675 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:30:22,694 DEBUG [Probe.Liveness] - Session Queue Size: 23, Session Count: 0, Max Session: 0
2025-02-17 03:30:22,695 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:31:22,705 DEBUG [Probe.Liveness] - Session Queue Size: 23, Session Count: 0, Max Session: 0
2025-02-17 03:31:22,707 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:32:22,656 DEBUG [Probe.Liveness] - Session Queue Size: 23, Session Count: 0, Max Session: 0
2025-02-17 03:32:22,657 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:33:22,685 DEBUG [Probe.Liveness] - Session Queue Size: 23, Session Count: 0, Max Session: 0
2025-02-17 03:33:22,686 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:34:22,667 DEBUG [Probe.Liveness] - Session Queue Size: 23, Session Count: 0, Max Session: 0
2025-02-17 03:34:22,668 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:35:22,676 DEBUG [Probe.Liveness] - Session Queue Size: 23, Session Count: 0, Max Session: 0
2025-02-17 03:35:22,677 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:36:22,685 DEBUG [Probe.Liveness] - Session Queue Size: 43, Session Count: 0, Max Session: 0
2025-02-17 03:36:22,687 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:37:22,692 DEBUG [Probe.Liveness] - Session Queue Size: 43, Session Count: 0, Max Session: 0
2025-02-17 03:37:22,694 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:38:22,670 DEBUG [Probe.Liveness] - Session Queue Size: 43, Session Count: 0, Max Session: 0
2025-02-17 03:38:22,671 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:39:22,697 DEBUG [Probe.Liveness] - Session Queue Size: 43, Session Count: 0, Max Session: 0
2025-02-17 03:39:22,699 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:40:22,694 DEBUG [Probe.Liveness] - Session Queue Size: 43, Session Count: 0, Max Session: 0
2025-02-17 03:40:22,695 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-02-17 03:41:22,693 DEBUG [Probe.Liveness] - Session Queue Size: 43, Session Count: 0, Max Session: 0
2025-02-17 03:41:22,694 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
Trapped SIGTERM/SIGINT/x so shutting down supervisord...
2025-02-17 03:41:23,517 WARN received SIGTERM indicating exit request
2025-02-17 03:41:23,518 INFO waiting for selenium-grid-distributor to die
2025-02-17 03:41:24,519 WARN stopped: selenium-grid-distributor (terminated by SIGTERM)
Today it happened again and I observed strange resource consumption in distributor and router.
Distributor consumed 2 cores:
Router consumed 6 cores:
At the same time keda provisioned 200 pods, which is expected for nightly testing:
2025-02-20T04:17:03Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 200}
But 0 of these pods were utilized by hub: router logs
Feb 20, 2025 3:41:11 AM org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler execute
WARNING: Unable to execute request: java.util.concurrent.TimeoutException
Build info: version: '4.28.1', revision: '73f5ad48a2'
System info: os.name: 'Linux', os.arch: 'amd64', os.version: '6.1.124-134.200.amzn2023.x86_64', java.version: '17.0.13'
Driver info: driver.version: unknown
org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.28.1', revision: '73f5ad48a2'
System info: os.name: 'Linux', os.arch: 'amd64', os.version: '6.1.124-134.200.amzn2023.x86_64', java.version: '17.0.13'
Driver info: driver.version: unknown
at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:424)
at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:54)
at org.openqa.selenium.grid.security.AddSecretFilter.lambda$apply$0(AddSecretFilter.java:40)
at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:55)
at org.openqa.selenium.grid.sessionqueue.remote.RemoteNewSessionQueue.addToQueue(RemoteNewSessionQueue.java:110)
at org.openqa.selenium.grid.sessionqueue.NewSessionQueue.lambda$new$0(NewSessionQueue.java:68)
at org.openqa.selenium.remote.http.Route$TemplatizedRoute.handle(Route.java:192)
at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:360)
at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
at org.openqa.selenium.grid.sessionqueue.NewSessionQueue.execute(NewSessionQueue.java:128)
at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:87)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:360)
at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
at org.openqa.selenium.grid.router.Router.execute(Router.java:89)
at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:360)
at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
at org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)
at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:360)
at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
at org.openqa.selenium.grid.security.BasicAuthenticationFilter.lambda$apply$0(BasicAuthenticationFilter.java:54)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:360)
at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:49)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.util.concurrent.TimeoutException
at java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2874)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
... 3 more
Distributor logs
Exception in thread "Local Distributor - Session Creation" org.openqa.selenium.WebDriverException: Command timed out after 1 minute(s)
Build info: version: '4.28.1', revision: '73f5ad48a2'
System info: os.name: 'Linux', os.arch: 'amd64', os.version: '6.1.127-135.201.amzn2023.x86_64', java.version: '17.0.13'
Driver info: driver.version: unknown
at jdk.internal.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at org.openqa.selenium.remote.ErrorCodec.decode(ErrorCodec.java:167)
at org.openqa.selenium.grid.web.Values.get(Values.java:47)
at org.openqa.selenium.grid.sessionmap.remote.RemoteSessionMap.makeRequest(RemoteSessionMap.java:118)
at org.openqa.selenium.grid.sessionmap.remote.RemoteSessionMap.add(RemoteSessionMap.java:82)
at org.openqa.selenium.grid.distributor.local.LocalDistributor.newSession(LocalDistributor.java:598)
at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable.handleNewSessionRequest(LocalDistributor.java:861)
at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable.lambda$run$1(LocalDistributor.java:818)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
@VietND96 If you're interested, I can send you the logs from the distributor and router for the past few hours to your email for troubleshooting
Can you also share your values file that was used to deploy the setup (just set of different configs against the default values.yaml)? And also the code where you initialize RemoteWebDriver (which includes all browser args, capabilities) for me to reproduce?
I just saw your issue seems similar to #2655 where the Distributor and Router get rebooted.
I sent my log values to your email.
Here's my override for the default values.yaml:
global:
seleniumGrid:
stdoutProbeLog: true
nodesImageTag: nightly # Pull the latest stable chrome image all the time (other options: dev | beta)
ingress:
hostname: my.app.host
nginx:
useHttp2: false
tls:
- secretName: my.app.host-tls-secret
hosts:
- my.app.host
autoscaling:
patchObjectFinalizers:
enabled: false # doesn't work with argocd
annotations: ! # remove helm hooks
enableWithExistingKEDA: true
scaledJobOptions:
scalingStrategy:
strategy: default
scaledOptions:
minReplicaCount: 0
maxReplicaCount: 200
pollingInterval: 10
chromeNode:
enabled: true
annotations:
karpenter.sh/do-not-disrupt: "true" # protect from karpenter evict
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "2Gi"
cpu:
extraVolumeMounts:
- name: downloads
mountPath: /home/seluser/Downloads
extraVolumes:
- name: downloads
emptyDir: {}
extraEnvironmentVariables:
- name: SE_BROWSER_ARGS_DISABLE_FEATURES
value: "--disable-features=OptimizationGuideModelDownloading,OptimizationHintsFetching,OptimizationTargetPrediction,OptimizationHints"
- name: SE_BROWSER_ARGS_DISABLE_SEARCH_ENGINE
value: "--disable-search-engine-choice-screen"
- name: SE_JAVA_OPTS
value: "-Dselenium.LOGGER.level=SEVERE -Dwebdriver.chrome.whitelistedIps="
- name: SE_VNC_NO_PASSWORD
value: "true"
- name: SE_SCREEN_WIDTH
value: "1920"
- name: SE_SCREEN_HEIGHT
value: "1200"
- name: SE_NODE_SESSION_TIMEOUT
value: "1800"
- name: SE_NODE_ENABLE_MANAGED_DOWNLOADS
value: "true"
readinessProbe:
enabled: false
livenessProbe:
enabled: true
hpa:
unsafeSsl: 'false'
platformName: "Linux"
firefoxNode:
enabled: true
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "2Gi"
cpu:
extraVolumeMounts:
- name: downloads
mountPath: /home/seluser/Downloads
extraVolumes:
- name: downloads
emptyDir: { }
extraEnvironmentVariables:
- name: SE_VNC_NO_PASSWORD
value: "true"
- name: SE_SCREEN_WIDTH
value: "1920"
- name: SE_SCREEN_HEIGHT
value: "1200"
readinessProbe:
enabled: false
livenessProbe:
enabled: true
hpa:
platformName: "Linux"
edgeNode:
enabled: false
redis:
enabled: true
isolateComponents: true
components:
sessionMap:
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu:
memory: "2Gi"
externalDatastore:
enabled: true
backend: redis
sessionQueue:
extraEnvironmentVariables:
- name: SE_SESSION_REQUEST_TIMEOUT
value: "1800"
router:
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu:
memory: "2Gi"
distributor:
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu:
memory: "8Gi"
basicAuth:
enabled: true
username: user
password: 123456
embeddedUrl: true
webdriver init snippet:
# pom.xml dep
<dependency>
<groupId>io.github.bonigarcia</groupId>
<artifactId>webdrivermanager</artifactId>
<version>5.9.2</version>
<scope>test</scope>
</dependency>
chromeOpts.setCapability("browserName", "chrome");
chromeOpts.setCapability("platformName", "Linux");
try {
Map<String, Object> prefsMap = new HashMap<>();
prefsMap.put("profile.default_content_settings.popups", 0);
prefsMap.put("download.default_directory", "/home/seluser/Downloads/");
chromeOpts.setExperimentalOption("prefs", prefsMap);
WebDriver driver;
driver = WebDriverManager.chromedriver()
.remoteAddress(new URL(gridUrl))
.capabilities(chromeOpts)
.create();
driver.manage().window().maximize();
FileHandler.setCurrentDownloadFolderName(driver, downloadFolderName);
FileHandler.setCurrentDownloadFolderPath(driver, downloadFilepath);
return driver;
} catch (MalformedURLException e) {
throw new RuntimeException(e);
}
In the meantime, can you validate with the latest version 4.29.0?
Thanks, just applied it, I think I'll have results within this week.
After 2 weeks of testing no issues so far, thanks
After 2 weeks of testing no issues so far, thanks
Great to hear that 🎉
Seem like the issue returned back:
2025-03-12 16:03:19,067 DEBUG [Probe.Liveness] - Distributor is healthy.
2025-03-12 16:04:19,076 DEBUG [Probe.Liveness] - Distributor is healthy.
2025-03-12 16:05:19,057 DEBUG [Probe.Liveness] - Distributor is healthy.
2025-03-12 16:06:19,055 DEBUG [Probe.Liveness] - Distributor is healthy.
2025-03-12 16:07:19,060 DEBUG [Probe.Liveness] - Distributor is healthy.
2025-03-12 16:08:19,090 DEBUG [Probe.Liveness] - Session Queue Size: 19, Session Count: 0, Max Session: 0
2025-03-12 16:08:19,091 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
2025-03-12 16:09:19,122 DEBUG [Probe.Liveness] - Session Queue Size: 19, Session Count: 0, Max Session: 0
2025-03-12 16:09:19,123 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.
16:09:36.285 INFO [Node.<init>] - Binding additional locator mechanisms: relative
16:09:36.580 INFO [GridModel.setAvailability] - Switching Node 1c4edfdf-bec5-45f1-a68d-fd7efdbeff55 (uri: http://10.1.95.42:5555) from DOWN to UP
16:09:36.580 INFO [LocalDistributor.add] - Added node 1c4edfdf-bec5-45f1-a68d-fd7efdbeff55 at http://10.1.95.42:5555. Health check every 120s
keda:
2025-03-12T16:07:11Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:07:11Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:07:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:07:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:07:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:07:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:07:31Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:07:31Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:07:31Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:07:31Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:07:41Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:07:41Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:07:41Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:07:41Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:07:51Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:07:51Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:07:51Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:07:51Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:08:01Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:08:01Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:08:01Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:08:01Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:08:11Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:08:11Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:08:11Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:08:11Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:08:11Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Effective number of max jobs": 19}
2025-03-12T16:08:11Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of jobs": 19}
2025-03-12T16:08:11Z INFO scaleexecutor Created jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of jobs": 19}
2025-03-12T16:08:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:08:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:08:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 19}
2025-03-12T16:08:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 19}
2025-03-12T16:08:21Z INFO scaleexecutor No need to create jobs - all requested jobs already exist {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "jobs": 0}
2025-03-12T16:08:31Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:08:31Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:08:31Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 19}
2025-03-12T16:08:31Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 19}
2025-03-12T16:08:31Z INFO scaleexecutor No need to create jobs - all requested jobs already exist {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "jobs": 0}
2025-03-12T16:08:41Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:08:41Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:08:41Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 19}
2025-03-12T16:08:41Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 19}
2025-03-12T16:08:41Z INFO scaleexecutor No need to create jobs - all requested jobs already exist {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "jobs": 0}
2025-03-12T16:08:51Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:08:51Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:08:51Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 19}
2025-03-12T16:08:51Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 19}
2025-03-12T16:08:51Z INFO scaleexecutor No need to create jobs - all requested jobs already exist {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "jobs": 0}
2025-03-12T16:09:01Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:09:01Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:09:01Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 19}
2025-03-12T16:09:01Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 19}
2025-03-12T16:09:01Z INFO scaleexecutor No need to create jobs - all requested jobs already exist {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "jobs": 0}
2025-03-12T16:09:11Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:09:11Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:09:11Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 19}
2025-03-12T16:09:11Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 19}
2025-03-12T16:09:11Z INFO scaleexecutor No need to create jobs - all requested jobs already exist {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "jobs": 0}
2025-03-12T16:09:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:09:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
2025-03-12T16:09:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 19}
2025-03-12T16:09:21Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 19}
2025-03-12T16:09:21Z INFO scaleexecutor No need to create jobs - all requested jobs already exist {"scaledJob.Name": "selenium-grid-selenium-node-chrome", "scaledJob.Namespace": "selenium-grid", "jobs": 0}
2025-03-12T16:09:31Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of running Jobs": 0}
2025-03-12T16:09:31Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-node-firefox", "scaledJob.Namespace": "selenium-grid", "Number of pending Jobs": 0}
Also reproducible after upgrade to chart 0.40.1
❯ k get pods -n selenium-grid
NAME READY STATUS RESTARTS AGE
selenium-grid-redis-master-0 1/1 Running 0 6h52m
selenium-grid-selenium-distributor-57b4f9b486-q5bxl 1/1 Running 0 6h59m
selenium-grid-selenium-event-bus-fc79c4f9d-lxhmz 1/1 Running 0 6h42m
selenium-grid-selenium-node-chrome-28llw-sr5r6 1/1 Running 0 19m
selenium-grid-selenium-node-chrome-5xglm-7gfwl 1/1 Running 0 19m
selenium-grid-selenium-node-chrome-64v4l-f27gg 1/1 Running 0 21m
selenium-grid-selenium-node-chrome-8gtn8-sxg8x 1/1 Running 0 19m
selenium-grid-selenium-node-chrome-b2jz9-l7r9h 1/1 Running 0 25m
selenium-grid-selenium-node-chrome-bc6x9-75hrj 1/1 Running 0 20m
selenium-grid-selenium-node-chrome-cqn5m-qs69s 1/1 Running 0 19m
selenium-grid-selenium-node-chrome-h6dgt-lv6nm 1/1 Running 0 23m
selenium-grid-selenium-node-chrome-hxcrb-66ffg 1/1 Running 0 25m
selenium-grid-selenium-node-chrome-kggj4-kcf86 1/1 Running 0 19m
selenium-grid-selenium-node-chrome-kzvx2-42s4x 1/1 Running 0 25m
selenium-grid-selenium-node-chrome-ls7d2-jg2ns 1/1 Running 0 20m
selenium-grid-selenium-node-chrome-m2tb8-f2skm 1/1 Running 0 20m
selenium-grid-selenium-node-chrome-m59qb-9cd2f 1/1 Running 0 20m
selenium-grid-selenium-node-chrome-pxnd4-r6jpn 1/1 Running 0 20m
selenium-grid-selenium-node-chrome-qzthw-rwh24 1/1 Running 0 19m
selenium-grid-selenium-node-chrome-vvmmm-g9r7s 1/1 Running 0 30h
selenium-grid-selenium-node-chrome-wnjmk-xff5q 1/1 Running 0 20m
selenium-grid-selenium-router-7c98c9d8d5-66vbq 1/1 Running 0 6h52m
selenium-grid-selenium-session-map-cf95f846b-jfn6z 1/1 Running 0 6h52m
selenium-grid-selenium-session-queue-7c44c456d8-8q975 1/1 Running 0 6h49m
There are some nodes that up and running for 30h, while they should be downscaled, or failed to register:
❯ k -n selenium-grid logs selenium-grid-selenium-node-chrome-vvmmm-g9r7s
Defaulted container "selenium-grid-selenium-node-chrome" out of: selenium-grid-selenium-node-chrome, pre-puller-selenium-grid-selenium-node-chrome (init)
2025-03-13 02:37:54,579 INFO Included extra file "/etc/supervisor/conf.d/chrome-cleanup.conf" during parsing
2025-03-13 02:37:54,579 INFO Included extra file "/etc/supervisor/conf.d/recorder.conf" during parsing
2025-03-13 02:37:54,579 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2025-03-13 02:37:54,579 INFO Included extra file "/etc/supervisor/conf.d/uploader.conf" during parsing
2025-03-13 02:37:54,582 INFO RPC interface 'supervisor' initialized
2025-03-13 02:37:54,582 INFO supervisord started with pid 8
2025-03-13 02:37:55,588 INFO spawned: 'xvfb' with pid 9
2025-03-13 02:37:55,590 INFO spawned: 'vnc' with pid 10
2025-03-13 02:37:55,592 INFO spawned: 'novnc' with pid 11
2025-03-13 02:37:55,594 INFO spawned: 'selenium-node' with pid 12
2025-03-13 02:37:55,608 INFO success: selenium-node entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Appending Selenium option: --enable-managed-downloads true
Appending Selenium option: --register-period 120
Appending Selenium option: --register-cycle 5
Appending Selenium option: --register-shutdown-on-failure
Appending Selenium option: --heartbeat-period 30
Appending Selenium option: --log-level INFO
Appending Selenium option: --http-logs false
Appending Selenium option: --structured-logs false
Generating Selenium Config
Appending Selenium option: --tracing false
Tracing is disabled
Appending Selenium option: --bind-host false
Appending Selenium option: --config /opt/selenium/config.toml
Selenium Grid Node configuration:
[events]
publish = "tcp://selenium-grid-selenium-event-bus:4442"
subscribe = "tcp://selenium-grid-selenium-event-bus:4443"
[server]
host = "10.1.60.149"
port = "5555"
[node]
grid-url = "https://XXXX"
session-timeout = 1800
override-max-sessions = false
detect-drivers = false
drain-after-session-count = 1
max-sessions = 1
[[node.driver-configuration]]
display-name = "chrome"
stereotype = '{"browserName": "chrome", "browserVersion": "", "platformName": "Linux", "goog:chromeOptions": {"binary": "/usr/bin/google-chrome"}, "se:containerName": "selenium-grid-selenium-node-chrome-vvmmm-g9r7s", "container:hostname": "selenium-grid-selenium-node-chrome-vvmmm-g9r7s"}'
max-sessions = 1
Starting Selenium Grid Node...
Using JAVA_OPTS: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/selenium/logs -Dselenium.LOGGER.level=SEVERE -Dwebdriver.chrome.whitelistedIps= -Dwebdriver.remote.enableTracing=false -Dwebdriver.httpclient.version=HTTP_1_1
02:37:56.402 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
02:37:56.408 INFO [LoggingOptions.getTracer] - Using null tracer
02:37:56.456 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://selenium-grid-selenium-event-bus:4442 and tcp://selenium-grid-selenium-event-bus:4443
02:37:56.504 INFO [UnboundZmqEventBus.<init>] - Sockets created
2025-03-13 02:37:56,616 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-03-13 02:37:56,616 INFO success: vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-03-13 02:37:56,616 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
02:37:57.505 INFO [UnboundZmqEventBus.<init>] - Event bus ready
02:37:57.641 INFO [NodeServer.createHandlers] - Reporting self as: http://10.1.60.149:5555
02:37:57.645 INFO [LoggingOptions.getTracer] - Using null tracer
02:37:57.661 INFO [NodeOptions.getSessionFactories] - Detected 8 available processors
02:37:57.712 INFO [NodeOptions.report] - Adding chrome for {"browserName": "chrome","browserVersion": "","container:hostname": "selenium-grid-selenium-node-chrome-vvmmm-g9r7s","goog:chromeOptions": {"binary": "\u002fusr\u002fbin\u002fgoogle-chrome"},"platformName": "linux","se:containerName": "selenium-grid-selenium-node-chrome-vvmmm-g9r7s","se:downloadsEnabled": true,"se:noVncPort": 7900,"se:vncEnabled": true} 1 times
02:37:57.753 INFO [Node.<init>] - Binding additional locator mechanisms: relative
02:37:57.919 INFO [NodeServer$2.start] - Starting registration process for Node http://10.1.60.149:5555
02:37:57.921 INFO [NodeServer.execute] - Started Selenium node 4.29.0 (revision 18ae989): http://10.1.60.149:5555
02:37:57.933 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
02:37:58.031 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
02:37:58.666 INFO [LocalNode.newSession] - Session created by the Node. Id: c1a53540374088557a8c27905f0cf44a, Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 134.0.6998.35, chrome: {chromedriverVersion: 134.0.6998.35 (ea6ef4c2ac15..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:32891}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: wss://selenium-grid.appx.hu..., se:cdpVersion: 134.0.6998.35, se:containerName: selenium-grid-selenium-node..., se:downloadsEnabled: true, se:noVncPort: 7900, se:vnc: wss://selenium-grid.appx.hu..., se:vncEnabled: true, se:vncLocalAddress: ws://10.1.60.149:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
02:37:58.668 INFO [LocalNode.checkSessionCount] - Draining Node, configured sessions value (1) has been reached.
02:37:58.716 INFO [ProxyNodeWebsockets.createWsEndPoint] - Establishing connection to ws://localhost:32891/devtools/browser/ca3c1b6a-cdb8-47d6-9b83-e152fc17cb7e
02:39:35.297 WARN [ProxyNodeWebsockets$ForwardingListener.onError] - Error proxying websocket command
java.net.SocketException: Connection reset
at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:401)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:434)
at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1178)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:841)
at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:181)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:280)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:233)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:782)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:965)
at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:253)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:1469)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:1414)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:1414)
02:39:35.383 INFO [SessionSlot.stop] - Stopping session c1a53540374088557a8c27905f0cf44a
02:39:35.383 INFO [LocalNode.stopTimedOutSession] - Node draining complete!
There are some nodes that up and running for 30h, while they should be downscaled, or failed to register:
I am checking this case, looks like it went through 02:39:35.383 INFO [LocalNode.stopTimedOutSession] - Node draining complete! but didn't count drain-after-session-count reached 1 and shut down the Node completely.