[🐛 Bug]: Selenium Grid, request stuck in retry loop
What happened?
Everything was working fine with Selenium Grid 3, but when we switched over to Selenium Grid 4 we started experiencing steps in our test cases sometimes taking ~18 minutes longer to complete. Looking at the logs, it seems that a request gets stuck in a retry loop. We have observed it for several different types of requests (e.g. /execute/sync, /frame/, /se/log).
We are running Selenium Grid with Docker/Podman using the following compose file:
hub:
image: selenium/hub:4.2.0-20220527
container_name: hub
ports:
- "4442:4442"
- "4443:4443"
- "4444:4444"
restart: always
environment:
- SE_SESSION_REQUEST_TIMEOUT=2400
node:
image: selenium/node-chrome:4.2.0-20220527
shm_size: 2gb
container_name: node
depends_on:
- hub
restart: always
environment:
- SE_EVENT_BUS_HOST=hub
- SE_EVENT_BUS_PUBLISH_PORT=4442
- SE_EVENT_BUS_SUBSCRIBE_PORT=4443
- SE_NODE_SESSION_TIMEOUT=300
- SE_NODE_OVERRIDE_MAX_SESSIONS=true
- SE_NODE_MAX_SESSIONS=4
- SCREEN_WIDTH=1920
- SCREEN_HEIGHT=1080
- SCREEN_DEPTH=24
volumes:
- /dev/shm:/dev/shm
ports:
- "9002:5900"
links:
- hub
I'll try to explain the scenario we have observed in the logs, starting from the request before the one that gets stuck.
1. Hub: Sends previous request.
13:36:23.034 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x7d412036, L:/10.88.0.31:51890 - R:/10.88.0.31:5555] for GET '/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'
2. Node: Previous request gets a channel.
13:36:23.036 DEBUG [NettyRequestSender.pollPooledChannel] - Using pooled Channel '[id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]' for 'GET' to 'http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled' 13:36:23.036 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380] for GET '/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'
3. Node: Previous request is sent and receives response.
13:36:23.043 DEBUG [HttpHandler.handleHttpResponse] - Request DefaultFullHttpRequest(decodeResult: success, version: HTTP/1.1, content: EmptyByteBufBE) GET /session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled HTTP/1.1 Accept: application/json, image/png User-Agent: selenium/4.0.0 (.net linux) Connection: keep-alive Cache-Control: no-cache traceparent: 00-f1d0e00c8e477eda9572995e0b7da776-629612b9dedff6a5-01 host: localhost:60380 Content-Type: application/json; charset=utf-8
Response DefaultHttpResponse(decodeResult: success, version: HTTP/1.1) HTTP/1.1 200 OK Content-Type: application/json; charset=utf-8 cache-control: no-cache content-length: 14
4. Node: Channel for previous request is offered back to pool.
13:36:23.044 DEBUG [ChannelManager.tryToOfferChannelToPool] - Adding key: http://localhost:60380 for channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]
5. Hub: Sends new request.
13:36:23.055 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x7d412036, L:/10.88.0.31:51890 - R:/10.88.0.31:5555] for POST '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'
6. Node: New request is processed, but it freezes after "injecting" log (where is normally would say "Using __ channel").
13:36:23.059 DEBUG [HttpTracing.inject] - Injecting (POST) /session/20135a43e0062e63b125ccdf0a75ab23/execute/sync into OpenTelemetrySpan{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da,spanId=31fc6d6d7e2fbf62} at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler:84
7. Node: About a minuter later the idleChannelDetector finds that the channel used by the previous request is idle, and closes it.
13:37:23.557 DEBUG [DefaultChannelPool$IdleChannelDetector.expiredChannels] - Adding Candidate expired Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380] isIdleTimeoutExpired=true isRemotelyClosed=false isTtlExpired=false 13:37:23.557 DEBUG [DefaultChannelPool$IdleChannelDetector.closeChannels] - Closing Idle Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]
8. Hub: About two minuter after that, the hub gets server error and retries (180 s default timeout).
13:39:23.076 DEBUG [RetryRequest.lambda$static$7] - Failure due to server error #1. Retrying. 13:39:23.082 DEBUG [NettyConnectListener.writeRequest] - Using new Channel '[id: 0xc39d1690, L:/10.88.0.31:52094 - R:/10.88.0.31:5555]' for 'POST' to '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'
9. Node: the frozen request starts running and gets a channel, then gets stuck again.
13:39:23.087 DEBUG [NettyConnectListener.writeRequest] - Using new Channel '[id: 0x3d99f31f, L:/127.0.0.1:39510 - R:localhost/127.0.0.1:60380]' for 'POST' to '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'
10. Node: The new retry request gets frozen after the "injecting" log.
13:39:23.089 DEBUG [HttpTracing.inject] - Injecting (POST) /session/20135a43e0062e63b125ccdf0a75ab23/execute/sync into OpenTelemetrySpan{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da,spanId=656ae3b1f9818e50} at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler:84
11. Five minutes after the call in step (9) the node gets a timeout exception for the first frozen request.
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
... 32 more
13:44:23.082 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da, spanId=31fc6d6d7e2fbf62, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da, spanId=6ee9490e9202f7ef, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=https://opentelemetry.io/schemas/1.9.0, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.14.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null}, name=node.forward_command, kind=INTERNAL, startEpochNanos=1655904983059769000, endEpochNanos=1655905463082764947, attributes=AttributesMap{data={http.method=POST, random.key=e65233dd-8b7d-4030-89f9-c26c7e17877e, span.kind=server, http.target=/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync, error=true}, capacity=128, totalAddedValues=5}, totalAttributeCount=5, events=[ImmutableEventData{name=exception, attributes={exception.message="Unable to execute request: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown", exception.stacktrace="org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
... 32 more
", exception.type="org.openqa.selenium.TimeoutException", http.flavor=1, http.handler_class="org.openqa.selenium.remote.http.Route$PredicatedRoute", http.host="hub:4444", http.method="POST", http.request_content_length="99", http.scheme="HTTP", http.target="/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync", http.user_agent="selenium/4.0.0 (.net linux)"}, epochNanos=1655905463082460855, totalAttributeCount=11}], totalRecordedEvents=1, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=ERROR, description=Kind: UNKNOWN Description:}, hasEnded=true}
13:44:23.082 WARN [SeleniumSpanExporter$1.lambda$export$1] - Unable to execute request: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
13:44:23.082 WARN [SeleniumSpanExporter$1.lambda$export$1] - org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
... 32 more
13:44:23.083 WARN [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "a780c6e2ef7b2b647a0c64e1a3a0a9da","eventTime": 1655905463082460855,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: java.util.concurrent.TimeoutException\nBuild info: version: '4.2.0', revision: '86eb611648'\nSystem info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\nBuild info: version: '4.2.0', revision: '86eb611648'\nSystem info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)\n\tat org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)\n\tat org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)\n\tat org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.node.Node.execute(Node.java:240)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)\n\t... 32 more\n","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.remote.http.Route$PredicatedRoute","http.host": "hub:4444","http.method": "POST","http.request_content_length": "99","http.scheme": "HTTP","http.target": "\u002fsession\u002f20135a43e0062e63b125ccdf0a75ab23\u002fexecute\u002fsync","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}
13:44:23.157 DEBUG [TimeoutTimerTask.expire] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms for NettyResponseFuture{currentRetry=0,
isDone=0,
isCancelled=0,
asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@144965c7,
nettyRequest=org.asynchttpclient.netty.request.NettyRequest@4595be47,
future=java.util.concurrent.CompletableFuture@43b022a9[Not completed],
uri=http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync,
keepAlive=true,
redirectCount=0,
timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@4f3f3e4a,
inAuth=0,
touch=1655905163087} after 300079 ms
13:44:23.157 DEBUG [ChannelManager.closeChannel] - Closing Channel [id: 0x3d99f31f, L:/127.0.0.1:39510 - R:localhost/127.0.0.1:60380]
13:44:23.158 DEBUG [NettyRequestSender.abort] - Aborting Future NettyResponseFuture{currentRetry=0,
isDone=0,
isCancelled=0,
asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@144965c7,
nettyRequest=org.asynchttpclient.netty.request.NettyRequest@4595be47,
future=java.util.concurrent.CompletableFuture@43b022a9[Not completed],
uri=http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync,
keepAlive=true,
redirectCount=0,
timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@4f3f3e4a,
inAuth=0,
touch=1655905163087}
13:44:23.158 DEBUG [NettyRequestSender.abort] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms
java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:60380 after 300000 ms
at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
13:44:23.158 DEBUG [AsyncCompletionHandler.onThrowable] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms
java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:60380 after 300000 ms
at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
-
And it continues like this. The hub retrying every 3 minutes, the request getting stuck in the node after "injecting" log and being released when a new retry request is made. The requests that are started in the node get a timeout exceptions after 5 minutes. Meanwhile other requests for other sessions are running in parallel.
-
After the hub has retried 5 times, the client starts making other calls to the hub with the same session. After that everything seems to work again.
How can we reproduce the issue?
-
Relevant log output
Logs in description above.
Operating System
Docker
Selenium version
C# 4.2.0
What are the browser(s) and version(s) where you see this issue?
selenium/node-chrome:4.2.0-20220527 but also earlier versions of Selenium Grid 4 image
What are the browser driver(s) and version(s) where you see this issue?
selenium/node-chrome:4.2.0-20220527 but also earlier versions of Selenium Grid 4 image
Are you using Selenium Grid?
4.2.0 (but issue also occurred with earlier versions of Selenium Grid 4)
@eete22, thank you for creating this issue. We will troubleshoot it as soon as we can.
Info for maintainers
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template label.
If the issue is a question, add the I-question label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-* label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer label.
Thank you!
This looks similar to #9528, which we are still investigating.
Can you please try the new Java 11 HTTP client and let us know? https://www.selenium.dev/blog/2022/using-java11-httpclient/
We updated to version 4.5.2 of the docker image which looks like it is using the Java 11 HTTP client by default. From what we can see when running our tests with this image, the issue is resolved. Great work!
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.