selenium
selenium copied to clipboard
[🐛 Bug]: Performance regression of Selenium 4.4
What happened?
Hi, I upgraded Selenium server to 4.4, and now I constantly get 5x or even over 20x performance decrease in comparison to 4.3.
This happens on our Github Actions build, and is even more significant on my local Linux machine (without Docker).
This is a typical result for set of ~15 tests ( = 15 new session requests followed by few simple commands) on GH Actions:
Locally, the difference much more significant: from 5 seconds with 4.3.0 to 2 minutes with 4.4.0 (24x slower).
The test run is: https://github.com/OndraM/php-webdriver/actions/runs/2943219537
From what I noted from the Selenium log, with version 4.4, there is always a big delay before line Session request received by the Distributor
, for example
13:35:01.350 DEBUG [DefaultChannelPool$IdleChannelDetector.run] - Closed 1 connections out of 1 in 0 ms
13:35:15.392 INFO [LocalDistributor.newSession] - Session request received by the Distributor:
note the 15 seconds of no logged operation. While with 4.3.0 the events follow almost instantaneously, for example:
13:34:21.646 DEBUG [RequestConverter.channelRead0] - Closing input pipe.
13:34:21.661 INFO [LocalDistributor.newSession] - Session request received by the Distributor:
For the tests, PHP language bindings for WebDriver are used - but the handshake and everything should be W3C WebDriver compatible, and was working OK with Selenium 4.3.0.
Please let me know if I can provide any more information or somehow help tracing the issue.
How can we reproduce the issue?
Run Github actions on branch https://github.com/OndraM/php-webdriver/tree/test/selenium-4-performance
Or run locally:
-
git clone -b test/selenium-4-performance [email protected]:OndraM/php-webdriver.git
-
cd php-webdriver/
-
composer install
- (in new terminal)
java -jar selenium-server-4.4.0.jar standalone
- (in new terminal)
php -S 127.0.0.1:8000 -t tests/functional/web/
-
vendor/bin/phpunit --filter WebDriverByTest -v --debug
Relevant log output
I enabled the finest log level, here are the test runs:
(Selenium server output is dumped at the end, in the "Prints log" part - search for cat ./logs/selenium-server.log
)
Operating System
Ubuntu (Github Actions), Arch (locally)
Selenium version
4.4.0 (jar server), php-webdriver 1.12.1
What are the browser(s) and version(s) where you see this issue?
Chrome 104 (headless)
What are the browser driver(s) and version(s) where you see this issue?
Chromedriver 104.0.5112.79
Are you using Selenium Grid?
4.4.0 in standalone mode
@OndraM, thank you for creating this issue. We will troubleshoot it as soon as we can.
Info for maintainers
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template
label.
If the issue is a question, add the I-question
label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted
label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-*
label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer
label.
Thank you!
Looking at the logs for this... The "Session request received by the Distributor" is very quick in 4.3, and takes up to 15 seconds in 4.4
Not sure if it matters, but 4.3 doesn't have any of these DefaultChannelPool$IdleChannelDetector
entries.
13:35:00.817 DEBUG [RequestConverter.channelRead0] - Incoming message: DefaultLastHttpContent(data: PooledSlicedByteBuf(ridx: 0, widx: 328, cap: 328/328, unwrapped: PooledUnsafeDirectByteBuf(ridx: 476, widx: 476, cap: 2048)), decoderResult: success)
13:35:00.817 DEBUG [RequestConverter.channelRead0] - Closing input pipe.
13:35:00.818 DEBUG [SpanWrappedHttpHandler.execute] - Wrapping request. Before OpenTelemetryContext{tracer=io.opentelemetry.sdk.trace.SdkTracer@5301193a, context={}, span id=0000000000000000, trace id=00000000000000000000000000000000} and after OpenTelemetryContext{tracer=io.opentelemetry.sdk.trace.SdkTracer@5301193a, context={opentelemetry-trace-span-key=SdkSpan{traceId=9e11407a8b9c375a9fbf9d0783df7920, spanId=16a39e934be186c6, parentSpanContext=ImmutableSpanContext{traceId=00000000000000000000000000000000, spanId=0000000000000000, traceFlags=00, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=false}, name=session_queue, kind=INTERNAL, attributes=AttributesMap{data={random.key=1d6b8d48-a1c6-4b0c-9656-2bb08eb9a0ae}, capacity=128, totalAddedValues=1}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1661693700818000000, endEpochNanos=0}}, span id=16a39e934be186c6, trace id=9e11407a8b9c375a9fbf9d0783df7920}
13:35:00.818 DEBUG [HttpTracing.inject] - Injecting (POST) /session into OpenTelemetrySpan{traceId=9e11407a8b9c375a9fbf9d0783df7920,spanId=16a39e934be186c6} at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler:84
13:35:01.350 DEBUG [DefaultChannelPool$IdleChannelDetector.run] - Entry count for : http://localhost:43625 : 1
13:35:01.350 DEBUG [DefaultChannelPool$IdleChannelDetector.expiredChannels] - Adding Candidate expired Channel [id: 0xff143818, L:/127.0.0.1:36634 ! R:localhost/127.0.0.1:43625] isIdleTimeoutExpired=false isRemotelyClosed=true isTtlExpired=false
13:35:01.350 DEBUG [DefaultChannelPool$IdleChannelDetector.closeChannels] - Closing Idle Channel [id: 0xff143818, L:/127.0.0.1:36634 ! R:localhost/127.0.0.1:43625]
13:35:01.350 DEBUG [DefaultChannelPool$IdleChannelDetector.run] - Closed 1 connections out of 1 in 0 ms
13:35:15.392 INFO [LocalDistributor.newSession] - Session request received by the Distributor:
I threw the logs into gists for easier parsing: https://gist.github.com/titusfortner/4fc1e999e66012a8c40caaa85682b785 https://gist.github.com/titusfortner/f6e6dda3e542f38c17a81a285cb1a1c9
also happens to me on 4.3.0
So while comparing the changes between 4.3.0 and 4.4.0 I noticed that the DEFAULT_RETRY_INTERVAL
was changed from 0 to 15
(was done in this commit)
This exactly matches the delay observed in the issue which is always around 15s
So just for fun I set the --session-retry-interval
to 0
(as it was before) however the problem was still present.
When however setting it to 1
I was able to get a Server/Grid response within 2-3s instead of 15s.
So for now using --session-retry-interval 1
looks like a workaround.
So will DEFAULT_RETRY_INTERVAL
be changed back to 0 or near 0? Otherwise it seems a lot of people will have slow executions and not know why.
In the mean time, can --session-retry-interval 1
workaround be used in the selenium docker images somehow?
@brondsem you can provide it with the SE_OPTS environment variable
Thanks for reporting this. In Docker you can set the value through SE_SESSION_RETRY_INTERVAL
. I will make changes so this goes back to the previous behavior.
I don't think modifying session-retry-interval addresses the speed discrepancy between 3.141 and 4.x
I don't think modifying session-retry-interval addresses the speed discrepancy between 3.141 and 4.x
Why not?
This particular issue is about the specific change made in 4.4 that increased time for Session request received by the Distributor
. If there is an additional speed-related issue to address, please open a new ticket with a reproducible case.
@diemol when will be the fix available please?
We're going to try to get a patch release out tomorrow.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.