selenium icon indicating copy to clipboard operation
selenium copied to clipboard

[🐛 Bug]: Race condition in ruby library for capybara system tests

Open krschacht opened this issue 5 months ago • 7 comments

What happened?

I've been using successfully using Capybara in Rails for quite some time (many months). But one day, about a month ago, my system tests started sporadically failing in my Github CI Actions with Net::ReadTimeout with "Net::ReadTimeout with #<TCPSocket:(closed)>". If I re-run the test suite a few times I can eventually get it to successfully run through. I've tried many different workarounds but none of them work around the issue. I've tried rolling back all changes in my repo to months ago when tests were consistently passing, and that doesn't seem to fix it either.

We've spent many hours investigating the cause and we currently think there is a race condition somewhere between chromedriver and selenium. My project is an open source project so here is a direct link to one of the failed CI runs where you can see the full stack trace: https://github.com/AllYourBot/hostedgpt/actions/runs/10533347868/job/29189182499?pr=498

The Net::ReadTimeout is coming from capybara (aka selenium) failing to hit chromedriver when attempting to set up the server. One of my engineers has outlined his read of that stack trace:

  • I think the tests run (and fail) before puma is started by capybara
  • The test hung because the server was still running and ruby wouldn't exit
  • It says the TCP socket was closed -- does this means the socket was open when it started but closed during the exchange? Or that it was never open? I suspect the former because the stack trace is in the middle of a read loop.
  • The failure is in the area of code which causes chromedriver to build a new session (ie, start chrome up):

Also, another thing that suggests a race condition is that when we SSH into the job mid-run, it sometimes fails or hangs for a bit. But if I interrupt the process (^c) and then re-run it, it goes fine.

Capybara Version: 3.39.2 Driver Information (and browser if relevant): selenium-webdriver (4.23.0) using headless chrome

How can we reproduce the issue?

1. On github you can [fork this repo](https://github.com/AllYourBot/hostedgpt)
2. I've configured the Github CI Actions to **not** run system tests on forks, but (a) [delete this line](https://github.com/AllYourBot/hostedgpt/blob/main/.github/workflows/rubyonrails.yml#L49) to remove the short circuit, and (b) change the very next "runs-on" line back to `ubuntu-latest` which are the default Github Action servers.
3. Push a change to the repo to trigger Github CI to run

Relevant log output

You can see the full stack trace: https://github.com/AllYourBot/hostedgpt/actions/runs/10533347868/job/29189182499?pr=498

Operating System

Alpine Linux

Selenium version

4.23.0 of selenium-webdriver gem

What are the browser(s) and version(s) where you see this issue?

Chrome

What are the browser driver(s) and version(s) where you see this issue?

ChromeDriver but not sure how to get version, latest, I think

Are you using Selenium Grid?

No

krschacht avatar Aug 28 '24 22:08 krschacht