selenium-google-code-issue-archive icon indicating copy to clipboard operation
selenium-google-code-issue-archive copied to clipboard

Thread safety issue when exception occurs during RemoteWebDriver quit within Selenium Grid

Open lukeis opened this issue 8 years ago • 6 comments

Originally reported on Google Code with ID 8441

Apologies for not being able to create a reproducible example but this is happening
a few times a week and the scenario is always the same;

Selenium Grid version 2.43 (but seen in earlier versions too)
Bindings; Ruby using Persistent Connections
Test Framework; RSpec2 
Executor; parallel_test (20 threads)

Scenario;

Quit is called on Session A but socket timeout exception occurs during the request
(see session A test.log)

This can be seen on the Selenium Hub but actually appears twice 60 seconds apart (see
hub.log)

Meanwhile, a test running Session B in a separate thread, stops running with an SO_TIMEOUT
error (see Session B test.log)

Note that the hub.log has no reference to any socket timeout for session B

The client.log taken from the selenium node which is running session B shows that no
error has been recorded on the node (see session B test client.log)  NB. There are
concurrent chrome sessions open on this node BUT NOT the session which had the original
timeout.

This error in which an unrelated session is terminated happens a few times a week but
is a small percentage of the overall tests executed.

Although I have been unable to create a test to replicate, my feeling is that when
a request to stop a session results in an exception, the registry.terminate is called
twice in two asynchronous threads (see current request handler file), and that this
may be a scenario which the threading approach does not account for?

We could make a change to the request handler but am not sure of the implications of
this (see proposed request handler file).

It would mean that if an exception occurred during 'Quit' the session would not be
marked as Stop_Session but continue to be SO_TIMEOUT instead, and I suppose this could
be a problem

Reported by wareham.robbie on 2015-01-30 12:15:09


- _Attachment: [hub.log](https://storage.googleapis.com/google-code-attachments/selenium/issue-8441/comment-0/hub.log)_ - _Attachment: [Session B test.log](https://storage.googleapis.com/google-code-attachments/selenium/issue-8441/comment-0/Session B test.log)_ - _Attachment: [current request handler](https://storage.googleapis.com/google-code-attachments/selenium/issue-8441/comment-0/current request handler)_ - _Attachment: [proposed request handler.txt](https://storage.googleapis.com/google-code-attachments/selenium/issue-8441/comment-0/proposed request handler.txt)_

lukeis avatar Mar 04 '16 09:03 lukeis

Ooops, forgot to attach file

Reported by wareham.robbie on 2015-01-30 12:40:37


- _Attachment: [session A test.log](https://storage.googleapis.com/google-code-attachments/selenium/issue-8441/comment-2/session A test.log)_

lukeis avatar Mar 04 '16 09:03 lukeis

Reported by barancev on 2015-02-02 18:48:25

  • Labels added: Component-Grid

lukeis avatar Mar 04 '16 09:03 lukeis

I have been changing the timeout value for the Ruby persistent client to 120 seconds
(default is 60), and see thsi problem occur a lot more AND the selenium Hub log shows
the Socket Timeout issue occur 3 times for each session rather than 2, and each 1 minute
apart.

I have noticed in the Ruby client code, requests can be made up to 3 times;

MAX_RETRIES = 3 

 retries = 0
            begin
              response = response_for(request)
            rescue Errno::ECONNABORTED, Errno::ECONNRESET, Errno::EADDRINUSE
              # a retry is sometimes needed on Windows XP where we may quickly
              # run out of ephemeral ports
              #
              # A more robust solution is bumping the MaxUserPort setting
              # as described here:
              #
              # http://msdn.microsoft.com/en-us/library/aa560610%28v=bts.20%29.aspx
              raise if retries >= MAX_RETRIES

              request = new_request_for(verb, url, headers, payload)
              retries += 1

              retry




Reported by wareham.robbie on 2015-02-04 10:52:04

lukeis avatar Mar 04 '16 09:03 lukeis

I have monkey patched the MAX_RETRIES to 0, and still see this issue but only 1 SO_TIMEOUT
logged.

The scenario is still the same; SO_TIMEOUT occurs while calling driver.quit (i.e DELETE),
some unreleated session then fails with either SO_TIMEOUT or CLIENT_STOPPED_SESSION


Reported by wareham.robbie on 2015-02-17 16:59:06

lukeis avatar Mar 04 '16 09:03 lukeis

This issue is still happening in version 2.46.

I am using C# NuGet package.

Parallel sessions spawned from the testfixture level will always result in seleniun
grid sessionids being lost resulting in errors like this:

[failed] Test 
Execute
OpenQA.Selenium.WebDriverException: Unexpected error. ERROR Job is not in progress
HResult: -2146233088
   at OpenQA.Selenium.Remote.RemoteWebDriver.UnpackAndThrowOnError(Response errorResponse)
   at OpenQA.Selenium.Remote.RemoteWebDriver.Execute(String driverCommandToExecute,
Dictionary`2 parameters)
   at OpenQA.Selenium.Remote.RemoteWebDriver.FindElement(String mechanism, String value)
   at OpenQA.Selenium.By.FindElement(ISearchContext context)
   at OpenQA.Selenium.Support.PageObjects.DefaultElementLocator.LocateElement(IEnumerable`1
bys)
   at OpenQA.Selenium.Support.PageObjects.WebElementProxy.Invoke(IMessage msg)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData,
Int32 type)
   at OpenQA.Selenium.IWebElement.Click()


and this:

Quitting Driver for LinkWorks!
[failed] Test Data: })/LinkWorks
Execute
OpenQA.Selenium.NoSuchElementException: Could not find element by: By.Id: //*[@id='i
am a link']
HResult: -2146233088
   at OpenQA.Selenium.Support.PageObjects.DefaultElementLocator.LocateElement(IEnumerable`1
bys)
   at OpenQA.Selenium.Support.PageObjects.WebElementProxy.Invoke(IMessage msg)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData,
Int32 type)
   at OpenQA.Selenium.IWebElement.Click()

This post also highlights this issue I think:
http://stackoverflow.com/a/10598692/359540

Reported by threesixtydegreesolutions on 2015-06-29 21:36:54

lukeis avatar Mar 04 '16 09:03 lukeis

Reported by luke.semerau on 2015-09-17 17:47:30

  • Labels added: Restrict-AddIssueComment-Commit

lukeis avatar Mar 04 '16 09:03 lukeis