selenium icon indicating copy to clipboard operation
selenium copied to clipboard

[🐛 Bug]: essage: Could not start a new session. Could not start a new session. Error while creating session with the driver service

Open jakobdo opened this issue 1 year ago • 10 comments

What happened?

I am trying to start a new session in a selenium grid and from time to time, I am getting this error:

selenium.common.exceptions.SessionNotCreatedException: Message: Could not start a new session. Could not start a new session. Error while creating session with the driver service. Stopping driver service: Could not start a new session. Response code 500. Message: Failed to decode response from marionette
Host info: host: 'SRVP01234', ip: '192.1168.1.68' Build info: version: '4.16.1', revision: '9b4c83354e' System info: os.name: 'Windows Server 2022', os.arch: 'amd64', os.version: '10.0', java.version: '21.0.1' Driver info: driver.version: unknown Build info: version: '4.16.1', revision: '9b4c83354e' System info: os.name: 'Windows Server 2022', os.arch: 'amd64', os.version: '10.0', java.version: '21.0.1' Driver info: driver.version: unknown Build info: version: '4.16.1', revision: '9b4c83354e' System info: os.name: 'Windows Server 2022', os.arch: 'amd64', os.version: '10.0', java.version: '21.0.1' Driver info: driver.version: unknown Stacktrace: at org.openqa.selenium.grid.node.remote.RemoteNode.newSession (RemoteNode.java:151) at org.openqa.selenium.grid.distributor.local.LocalDistributor.startSession (LocalDistributor.java:645) at org.openqa.selenium.grid.distributor.local.LocalDistributor.newSession (LocalDistributor.java:564) at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable.handleNewSessionRequest (LocalDistributor.java:824) at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable.lambda$run$1 (LocalDistributor.java:784) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1144) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:642) at java.lang.Thread.run (Thread.java:1583)

How can we reproduce the issue?

I am trying to start a session using python and this simplified code:


from selenium import webdriver
from selenium.webdriver import FirefoxOptions

with webdriver.Remote(
        command_executor="http://192.168.1.60:4444",
        options=FirefoxOptions(),
        file_detector=file_detector
    ) as driver:
    driver.get("http://google.com")

Config for nodes are:
[node]
driver-implementation = ["chrome", "firefox"]
hub = "selenium-grid.domain.local"
selenium-manager = true
max-sessions = 5
override-max-sessions = true
session-timeout = 60

I have tried with default settings for max-sessions and override-max-sessions, which gave same result.

Relevant log output

selenium.common.exceptions.SessionNotCreatedException: Message: Could not start a new session. Could not start a new session. Error while creating session with the driver service. Stopping driver service: Could not start a new session. Response code 500. Message: Failed to decode response from marionette  
Host info: host: 'SRVP01234', ip: '192.1168.1.68'
Build info: version: '4.16.1', revision: '9b4c83354e'
System info: os.name: 'Windows Server 2022', os.arch: 'amd64', os.version: '10.0', java.version: '21.0.1'
Driver info: driver.version: unknown
Build info: version: '4.16.1', revision: '9b4c83354e'
System info: os.name: 'Windows Server 2022', os.arch: 'amd64', os.version: '10.0', java.version: '21.0.1'
Driver info: driver.version: unknown
Build info: version: '4.16.1', revision: '9b4c83354e'
System info: os.name: 'Windows Server 2022', os.arch: 'amd64', os.version: '10.0', java.version: '21.0.1'
Driver info: driver.version: unknown
Stacktrace:
    at org.openqa.selenium.grid.node.remote.RemoteNode.newSession (RemoteNode.java:151)
    at org.openqa.selenium.grid.distributor.local.LocalDistributor.startSession (LocalDistributor.java:645)
    at org.openqa.selenium.grid.distributor.local.LocalDistributor.newSession (LocalDistributor.java:564)
    at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable.handleNewSessionRequest (LocalDistributor.java:824)
    at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable.lambda$run$1 (LocalDistributor.java:784)
    at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1144)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:642)
    at java.lang.Thread.run (Thread.java:1583)

Operating System

Windows Server 2022

Selenium version

Python 4.17.1

What are the browser(s) and version(s) where you see this issue?

Firefox 122 64bit, Chrome 121.0.6167.86 64bit

What are the browser driver(s) and version(s) where you see this issue?

Firefox GeckoDriver 0.34 (I am using this option: selenium-manager = true)

Are you using Selenium Grid?

Yes: 4.16.1

jakobdo avatar Jan 25 '24 07:01 jakobdo

@jakobdo, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

github-actions[bot] avatar Jan 25 '24 07:01 github-actions[bot]

Can you share the complete log from the Grid? Also the command you use to start the Grid.

diemol avatar Jan 25 '24 08:01 diemol

Hello diemol, and thanks for taking the time to look into this issue. But I have to admit, I was able to re-create the issue the following way: Start session 1. (kill it before calling driver.close() / driver.quit() ) Start session 2 and I got the above error. But after I have enabled logging, I am not able to get the error at all. So I have to say NO.

The selenium grid is running as a service, but started with this command:

java -jar selenium-server.jar node --config config.toml selenium-server.jar = 4.16.1 config.toml: [node] driver-implementation = ["chrome", "firefox"] hub = "selenium-grid.domain.local" selenium-manager = true session-timeout = 60

[logging] log-file = 'C:\selenium\logs\gridnode.log'

And I have tried different configs for more sessions to force an error, but not this time.

jakobdo avatar Jan 25 '24 10:01 jakobdo

I am just informed, that the servers are running using dynamic memory, so the problem could be related to the servers having to small amount of ram/memory for a start, but now they are boosted and supports multiple sessions.

jakobdo avatar Jan 25 '24 11:01 jakobdo

We have now changed the servers memory to fixed/static memory. 6GB pr server. And I have been able to get the error while logging was enabled. How much of the log do you want? I do not want to share too much details about the server/setup. (from a security point of view)

jakobdo avatar Jan 25 '24 14:01 jakobdo

Please find attached log from node.

github.selenium.grid.log

jakobdo avatar Jan 26 '24 10:01 jakobdo

Do you guys need more input/information from my side, to investigate this issue?

jakobdo avatar Jan 29 '24 09:01 jakobdo

I can see this in the logs:

Response code 500. Message: Failed to decode response from marionette

Which points to an error in GeckoDriver. After Googling, a workaround says that increasing memory helps (in case the host has limited resources).

I am not sure we can give any more pointers because you already mentioned above that the situation improved when more memory was allocated.

You should collect some CPU and memory metrics from your infrastructure and correlate when the issue happens and what resources were being used.

diemol avatar Jan 29 '24 10:01 diemol

Hello again, thanks for taking your time once again to look into this issue. I have now tried to bump the memory to 16GB on one of the nodes and stopped the service on the others. I am getting some errors in the startup, I am unsure if this is unrelated. (see attached file: grid.log)

I tried to contact the selenium grid and once again I got the same error, that started this issue. Then I restarted the service and it worked. I will now try to restart the server and see if I can get the error once more. When I am getting the error, the memory usage is close to 3-4gb, and not going up, when I am getting the error.

jakobdo avatar Jan 30 '24 11:01 jakobdo

I have exact the same problem and I have latest Selenium, which is 4.17. I reported bug as well https://github.com/SeleniumHQ/selenium/issues/13562

feller-kristina avatar Feb 08 '24 23:02 feller-kristina

Hi, @jakobdo. This issue has been determined to require fixes in GeckoDriver.

You can see if the feature is passing in the Web Platform Tests.

If it is something new, please create an Issue with the GeckoDriver team.

Feel free to comment the issues that you raise back in this issue. Thank you.

github-actions[bot] avatar Mar 08 '24 22:03 github-actions[bot]

This issue has been automatically locked since there has not been any recent activity since it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Apr 07 '24 22:04 github-actions[bot]