selenium
selenium copied to clipboard
[🐛 Bug]: Error handling in Java e.g. SessionNotCreatedException/UnreachableBrowserException caused by Error
What happened?
We had some issues with our system, this lead to an OutOfMemoryError error.
This is nothing Selenium should try to handle and wrap into a SessionNotCreatedException or UnreachableBrowserException in RemoteWebDriver#execute.
The javadoc of Error states ... indicates serious problems that a reasonable application should not try to catch.
By wrapping this into a SessionNotCreatedException or UnreachableBrowserException the Error is "downgraded" to a RuntimeException, ... superclass of those exceptions that can be thrown during the normal operation of the Java Virtual Machine.
One might try to run the browser on a different machine after a SessionNotCreatedException has has occurred. But the real reason might be a local issue (e.g. OutOfMemoryError, NoClassDefFoundError, etc.)
I am not sure if there are other places on which this applies.
How can we reproduce the issue?
public static void main(String[] args) {
new RemoteWebDriver(new CommandExecutor() {
@Override
public Response execute(Command command) throws IOException {
throw new OutOfMemoryError("this just a fake");
}
}, new ChromeOptions());
}
Relevant log output
Exception in thread "main" org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Build info: version: '4.3.0', revision: 'a4995e2c09*'
System info: host: 'myHost', ip: '192.168.1.10', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '17.0.3'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [null, newSession {capabilities=[Capabilities {browserName: chrome, goog:chromeOptions: {args: [], extensions: []}}], desiredCapabilities=Capabilities {browserName: chrome, goog:chromeOptions: {args: [], extensions: []}}}]
Capabilities {}
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:587)
at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:264)
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:179)
at just.a.Dummy.main(Dummy.java:21)
Caused by: java.lang.OutOfMemoryError: this just a fake
at just.a.Dummy$1.execute(Dummy.java:24)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:569)
... 3 more
Operating System
Windows 10 x64
Selenium version
Java 4.x
What are the browser(s) and version(s) where you see this issue?
N/A
What are the browser driver(s) and version(s) where you see this issue?
N/A
Are you using Selenium Grid?
No
@joerg1985, thank you for creating this issue. We will troubleshoot it as soon as we can.
Info for maintainers
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template label.
If the issue is a question, add the I-question label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-* label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer label.
Thank you!
You have a good point there. It does make sense to avoid wrapping this type of exception so the end user realizes the error and takes action.
Nevertheless, I think we need to understand the context and what is being done with each one of the exceptions.
SessionNotCreatedException is thrown while a new session request is being processed.
First let's analyze the remote use case (e.g. with Selenium Grid). SessionNotCreatedException is how the Distributor will understand something went wrong while creating the session, and therefore will retry to create it in a different Node. If the Grid has several Nodes, and one of them is throwing OutOfMemoryError, but then the session is created in a different Node, and this should be transparent for the user. They just want to execute their tests. However, the infrastructure person who is in charge of monitoring the machines should notice issues in the machine without the need of getting an OutOfMemoryError exception.
In the local use case it might make more sense to throw the exception rather than wrapping it. The problem is that Grid and Java bindings share the same code base, and adding logic to identify if the code is running as part of a Grid or just being a local execution is less than ideal. In reality, if a user running a test locally has this issue, they most likely will notice their machine is having issues.
UnreachableBrowserException is the last resort to wrap any not handled exception
Here we can actually follow the advice. We can probably check if the exception is not an OutOfMemoryError exception, and throw it. We'd be happy to receive a PR that does this. However, the whole flow needs to be checked, since the expectation is to pass around WebDriverException.
@automationpi, feel free to rework #10919 based on the comments above, thanks.
This issue is looking for contributors.
Please comment below or reach out to us through our IRC/Slack/Matrix channels if you are interested.
regarding the PR: It might make sense to handle all Errors the "new" way, not only the two examples?
Right, so anything that was not handled.
regarding the "new" way of handling Errors: I am not familiar with the Grid, just my current throughts on this, might be total nonsense :D
An Error could let the Distributor disconnect the Node and retry on a different Node? Is there a automatic reconnect by the Node in this case? If it is totally broken it will probably not be able to reconnect, otherwise it will fix itself.
But this might be a big change for an rare case ...
An Error could let the Distributor disconnect the Node and retry on a different Node?
SessionNotCreatedException is what tells the Distributor to try on a different Node. There is a separate thread checking on the Node health, and after a few failed checks, it gets removed from the Grid.
Is there a automatic reconnect by the Node in this case? If it is totally broken it will probably not be able to reconnect, otherwise it will fix itself.
If the Node comes back to a healthy state, it will reconnect itself.
After reading the whole thread again, I am not sure if a change is needed. The reasoning I wrote above I believe is still valid after the questions/comments made by @joerg1985, that I replied right above. In addition, the linked PR is only wrapping the exception and changing the message sent back.
I will close this issue as it seems that the current behavior in the code is good enough. If some other reasoning comes around later, we can reopen and figure it out.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.