docker-selenium icon indicating copy to clipboard operation
docker-selenium copied to clipboard

[🐛 Bug]: Random GOAWAY frame received from Selenium Grid

Open cezzarez opened this issue 1 year ago • 5 comments

What happened?

We have a Selenium Grid setup using KEDA and Selenium Grid helm chart. KEDA v.2.13.0 and Selenium Grid helm chart v.0.28.3. When we are testing some of the pages we randomly received the:

Caused by: java.io.IOException: /10.251.7.8:35426: GOAWAY received From Selenium Grid.

The settings we have for Egde node: SE_NODE_SESSION_TIMEOUT=600 (because first we saw the timeout was reached, but 600 is much more than we need for tests) SE_SESSION_REQUEST_TIMEOUT=450 (a lot of values tested here, looks like it does not change anything) SE_JAVA_OPTS=-Dwebdriver.http.factory=jdk-http-client -Djdk.httpclient.websocket.intermediateBufferSize=3000000 (also different values tested for Buffer, but did not see any change for that) shm size = 4Gi (it was also increase/decreased and I do not see any changes for this param)

In the HUB I even change the Healthcheck interval from 120 to 300 because I though, that maybe it is because of the timeout.

I launch the logs level to ALL and to be honest still nothing special on EDGE or HUB hosts observed.

In the EDGE I can see the error: 09:27:16.633 DEBUG [PlainHttpConnection.close] - [HttpClient-1-SelectorManager] [186s 134ms] PlainHttpConnection(SocketTube(1)) Closing channel: channel registered with selector, key.interestOps=1, sa.interestOps=1 09:27:16.634 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) got read error: java.io.IOException: connection closed locally 09:27:16.634 DEBUG [SocketTube.debugState] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) leaving read() loop after EOF: Reading: [ops=0, demand=0, stopped=true], Writing: [ops=0, demand=1] 09:27:16.634 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) Read scheduler stopped 09:27:16.634 DEBUG [HttpClientImpl$SelectorAttachment.resetInterestOps] - [HttpClient-1-SelectorManager] [186s 134ms] SelectorAttachment key cancelled for java.nio.channels.SocketChannel[closed] 09:27:16.635 DEBUG [SocketTube$SocketFlowEvent.abort] - [HttpClient-1-SelectorManager] [186s 135ms] SocketTube(1) abort: java.io.IOException: java.nio.channels.CancelledKeyException 09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) next timeout: 0 09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) next expired: 0 09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) Next deadline is 3000 09:27:16.643 DEBUG [UrlChecker.lambda$waitUntilUnavailable$2] - Polling http://localhost:31879/shutdown 09:27:16.643 FINEST [HttpURLConnection.plainConnect0] - ProxySelector Request for http://localhost:31879/shutdown

But I cannot find anything interesting in the issues here or Stack Overflow about this "part" of logs...

In the HUB logs (ALL level) nothig what could suggest anythig.

So what happened...? No idea, we run the tests, we see the pod is created, but sometimes (in a very random moment) the GOAWAY frame is received. It suggest the Selenium is closed, but in logs there is no information what exactly is the problem. I am now sure it is not because of the memory, timeouts or anything like that, but maybe you as an experts know what could it be?

I want to just add, that sometimes the tests are going well, and are finished with SUCCESS state.

Command used to start Selenium Grid with Docker (or Kubernetes)

It would be just helm install for Selenium Grid helm chart on k8s.
You have to also set the autoscale to true to install KEDA.

Relevant log output

In the EDGE I can see the error: 
09:27:16.633 DEBUG [PlainHttpConnection.close] - [HttpClient-1-SelectorManager] [186s 134ms] PlainHttpConnection(SocketTube(1)) Closing channel: channel registered with selector, key.interestOps=1, sa.interestOps=1
09:27:16.634 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) got read error: java.io.IOException: connection closed locally
09:27:16.634 DEBUG [SocketTube.debugState] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) leaving read() loop after EOF:  Reading: [ops=0, demand=0, stopped=true], Writing: [ops=0, demand=1]
09:27:16.634 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-1-SelectorManager] [186s 134ms] SocketTube(1) Read scheduler stopped
09:27:16.634 DEBUG [HttpClientImpl$SelectorAttachment.resetInterestOps] - [HttpClient-1-SelectorManager] [186s 134ms] SelectorAttachment key cancelled for java.nio.channels.SocketChannel[closed]
09:27:16.635 DEBUG [SocketTube$SocketFlowEvent.abort] - [HttpClient-1-SelectorManager] [186s 135ms] SocketTube(1) abort: java.io.IOException: java.nio.channels.CancelledKeyException
09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) next timeout: 0
09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) next expired: 0
09:27:16.635 DEBUG [HttpClientImpl$SelectorManager.run] - [HttpClient-1-SelectorManager] [186s 135ms] HttpClientImpl(1) Next deadline is 3000
09:27:16.643 DEBUG [UrlChecker.lambda$waitUntilUnavailable$2] - Polling http://localhost:31879/shutdown
09:27:16.643 FINEST [HttpURLConnection.plainConnect0] - ProxySelector Request for http://localhost:31879/shutdown

From Jenkins, where we run the tests I can see randomly:
Caused by: java.io.UncheckedIOException: java.io.IOException: /10.251.7.8:35426: GOAWAY received

Operating System

Kubernetes (EKS)

Docker Selenium version (image tag)

4.18.1-20240224

Selenium Grid chart version (chart version)

0.28.3

cezzarez avatar Apr 07 '24 09:04 cezzarez

@cezzarez, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

github-actions[bot] avatar Apr 07 '24 09:04 github-actions[bot]

Why do you need these options?

SE_JAVA_OPTS=-Dwebdriver.http.factory=jdk-http-client -Djdk.httpclient.websocket.intermediateBufferSize=3000000

-Dwebdriver.http.factory=jdk-http-client is the default one since 4.14.

What test script can we use to replicate the issue?

diemol avatar Apr 09 '24 10:04 diemol

Hello @diemol ! Thank you for the reply. About the options, in the previous versions I was using them, as it was the internal requirement, probably missed the note about the default settings. Just removed them.

About the replication, not sure if it will be possible to do on your site... Our tests are using internal pages and the source code cannot be shared.

The tests are pretty complex also (they can took some time), but as I mentioned it is for sure not the timeout problem, because the GOAWAY frame is very random.

Haven't you see similar cases before? Maybe you have some hints, what could be the reason of it?

cezzarez avatar Apr 09 '24 13:04 cezzarez

I've never seen this reported before, so I would not know where to start without a way to reproduce it. Could it be something in your infrastructure?

diemol avatar Apr 09 '24 13:04 diemol

@diemol faced the same issue I guess. I have selenium grid 4.21.0, edge node browser 4.21.0. When I use serenity 4.0.1 (selenium 4.12.0) tests work correctly, but when I try to upgrade (e.g. 4.1.4 - selenium 4.18.1) then I receive this error after 15s of execution (execution works fine but then suddenly GOAWAY).

I use also io.github.bonigarcia:webdrivermanager.

here something similar: https://stackoverflow.com/questions/75896279/selenium-remote-run-java-io-ioexception-goaway-received

Session ID: <hidden id>
        at app//org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:546)
        at app//org.openqa.selenium.remote.ElementLocation$ElementFinder$2.findElement(ElementLocation.java:165)
        at app//org.openqa.selenium.remote.ElementLocation.findElement(ElementLocation.java:59)
        at app//org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:356)
        at app//org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:350)
        at app//net.thucydides.core.webdriver.WebDriverFacade.findElement(WebDriverFacade.java:259)
        at app//net.serenitybdd.core.pages.WebElementResolverByLocator.resolveForDriver(WebElementResolverByLocator.java:25)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.getResolvedElement(WebElementFacadeImpl.java:188)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.getElement(WebElementFacadeImpl.java:183)
        at app//net.serenitybdd.core.pages.WebElementExpectations$3.apply(WebElementExpectations.java:63)
        at app//net.serenitybdd.core.pages.WebElementExpectations$3.apply(WebElementExpectations.java:55)
        at app//org.openqa.selenium.support.ui.FluentWait.until(FluentWait.java:203)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.waitUntilEnabled(WebElementFacadeImpl.java:1136)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.waitUntilElementAvailable(WebElementFacadeImpl.java:908)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.click(WebElementFacadeImpl.java:1221)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.click(WebElementFacadeImpl.java:1241)
        at <hidden part>

        Caused by:
        java.io.UncheckedIOException: java.io.IOException: /<hidden ip>: GOAWAY received
            at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:443)
            at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
            at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:55)
            at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:359)
            at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:54)
            at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:187)
            at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)
            at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:519)
            at org.openqa.selenium.remote.ElementLocation$ElementFinder$2.findElement(ElementLocation.java:165)
            at org.openqa.selenium.remote.ElementLocation.findElement(ElementLocation.java:59)
            at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:356)
            at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:350)
            at net.thucydides.core.webdriver.WebDriverFacade.findElement(WebDriverFacade.java:259)
            at net.serenitybdd.core.pages.WebElementResolverByLocator.resolveForDriver(WebElementResolverByLocator.java:25)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.getResolvedElement(WebElementFacadeImpl.java:188)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.getElement(WebElementFacadeImpl.java:183)
            at net.serenitybdd.core.pages.WebElementExpectations$3.apply(WebElementExpectations.java:63)
            at net.serenitybdd.core.pages.WebElementExpectations$3.apply(WebElementExpectations.java:55)
            at org.openqa.selenium.support.ui.FluentWait.until(FluentWait.java:203)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.waitUntilEnabled(WebElementFacadeImpl.java:1136)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.waitUntilElementAvailable(WebElementFacadeImpl.java:908)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.click(WebElementFacadeImpl.java:1221)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.click(WebElementFacadeImpl.java:1241)
           <hidden part>
            at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.base/java.lang.reflect.Method.invoke(Method.java:566)
            at io.cucumber.java.Invoker.doInvoke(Invoker.java:66)
            at io.cucumber.java.Invoker.invoke(Invoker.java:24)
            at io.cucumber.java.AbstractGlueDefinition.invokeMethod(AbstractGlueDefinition.java:47)
            at io.cucumber.java.JavaStepDefinition.execute(JavaStepDefinition.java:29)
            at io.cucumber.core.runner.CoreStepDefinition.execute(CoreStepDefinition.java:66)
            at io.cucumber.core.runner.PickleStepDefinitionMatch.runStep(PickleStepDefinitionMatch.java:63)
            at io.cucumber.core.runner.ExecutionMode$1.execute(ExecutionMode.java:10)
            at io.cucumber.core.runner.TestStep.executeStep(TestStep.java:84)
            at io.cucumber.core.runner.TestStep.run(TestStep.java:56)
            at io.cucumber.core.runner.PickleStepTestStep.run(PickleStepTestStep.java:51)
            at io.cucumber.core.runner.TestCase.run(TestCase.java:84)
            at io.cucumber.core.runner.Runner.runPickle(Runner.java:75)
            at io.cucumber.junit.PickleRunners$NoStepDescriptions.lambda$run$0(PickleRunners.java:151)
            at io.cucumber.core.runtime.CucumberExecutionContext.lambda$runTestCase$5(CucumberExecutionContext.java:137)
            at io.cucumber.core.runtime.RethrowingThrowableCollector.executeAndThrow(RethrowingThrowableCollector.java:23)
            at io.cucumber.core.runtime.CucumberExecutionContext.runTestCase(CucumberExecutionContext.java:137)
            at io.cucumber.junit.PickleRunners$NoStepDescriptions.run(PickleRunners.java:148)
            at io.cucumber.junit.FeatureRunner.runChild(FeatureRunner.java:144)
            at io.cucumber.junit.FeatureRunner.runChild(FeatureRunner.java:28)
            at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
            at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
            at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
            at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
            at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
            at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
            at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
            at io.cucumber.junit.FeatureRunner.run(FeatureRunner.java:137)
            at io.cucumber.junit.CucumberSerenityBaseRunner.runChild(CucumberSerenityBaseRunner.java:332)
            at io.cucumber.junit.CucumberSerenityBaseRunner.runChild(CucumberSerenityBaseRunner.java:53)
            at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
            at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
            at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
            at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
            at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
            at io.cucumber.junit.CucumberSerenityBaseRunner$RunCucumber.evaluate(CucumberSerenityBaseRunner.java:363)
            at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
            at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
            at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:108)
            at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
            at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
            at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
            at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
            at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.base/java.lang.reflect.Method.invoke(Method.java:566)
            at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
            at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
            at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
            at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
            at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
            at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
            at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
            at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
            at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
            at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
            at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:113)
            at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
            at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
            at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)

            Caused by:
            java.io.IOException: /<hidden ip>: GOAWAY received
                at java.net.http/jdk.internal.net.http.Http2Connection.handleGoAway(Http2Connection.java:999)
                at java.net.http/jdk.internal.net.http.Http2Connection.handleConnectionFrame(Http2Connection.java:867)
                at java.net.http/jdk.internal.net.http.Http2Connection.processFrame(Http2Connection.java:738)
                at java.net.http/jdk.internal.net.http.frame.FramesDecoder.decode(FramesDecoder.java:155)
                at java.net.http/jdk.internal.net.http.Http2Connection$FramesController.processReceivedData(Http2Connection.java:232)
                at java.net.http/jdk.internal.net.http.Http2Connection.asyncReceive(Http2Connection.java:663)
                at java.net.http/jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.processQueue(Http2Connection.java:1292)
                at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(SequentialScheduler.java:175)
                at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:147)
                at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
                at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271)
                at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224)
                at java.net.http/jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.runOrSchedule(Http2Connection.java:1310

kakliniew avatar Jun 26 '24 13:06 kakliniew

@diemol could you look into this problem? I removed bonigarcia and saw in logs that after update it doesn't have 'ms:edgeOptions: {args: [--remote-allow-origins=*], extensions: []}}]', but I fixed that, still GOAWAY

kakliniew avatar Jul 12 '24 13:07 kakliniew

How can I look into it if there is not a way to reproduce it?

diemol avatar Jul 12 '24 19:07 diemol

@diemol thanks for answer. I thought perhaps the description and stacktrace would be sufficient. I can't share the code, but I will try to create code from scratch to reproduce the error

kakliniew avatar Jul 12 '24 20:07 kakliniew

I wrote all the details in #2328. The chart also includes config keys for annotations to disable HTTP/2 if TLS termination is used.

VietND96 avatar Jul 27 '24 15:07 VietND96

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Aug 27 '24 00:08 github-actions[bot]