kafka-ui icon indicating copy to clipboard operation
kafka-ui copied to clipboard

SSO occasional Connection reset error

Open ognjenVlad opened this issue 1 year ago • 10 comments

Issue submitter TODO list

  • [X] I've looked up my issue in FAQ
  • [X] I've searched for an already existing issues here
  • [X] I've tried running main-labeled docker image and the issue still persists there
  • [X] I'm running a supported version of the application which is listed here

Describe the bug (actual behavior)

When using SSO, web-ui sometimes returns 500. Hard to find out when exactly, but seems like it happens when coming back to web-ui after some time, although sometimes it happens when you just try to login randomly. It is always redirection to redirect-uri login/oauth2/code/{client}

{"code":5000,"message":"Connection reset","timestamp":1720450398132,"requestId":"e25340f5-656","fieldsErrors":null,"stackTrace":"org.springframework.web.reactive.function.client.WebClientRequestException: Connection reset\n\tat org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:136)\n\tSuppressed: The stacktrace has been enhanced by Reactor, refer to additional information below: \nError has been observed at the following site(s):\n\t*__checkpoint ⇢ Request to POST https://***/token [DefaultWebClient]\n\t*__checkpoint ⇢ OAuth2LoginAuthenticationWebFilter [DefaultWebFilterChain]\n\t*__checkpoint ⇢ OAuth2AuthorizationRequestRedirectWebFilter [DefaultWebFilterChain]\n\t*__checkpoint ⇢ ReactorContextWebFilter [DefaultWebFilterChain]\n\t*__checkpoint ⇢ HttpHeaderWriterWebFilter [DefaultWebFilterChain]\n\t*__checkpoint ⇢ ServerWebExchangeReactorContextWebFilter [DefaultWebFilterChain]\n\t*__checkpoint ⇢ org.springframework.security.web.server.WebFilterChainProxy [DefaultWebFilterChain]\n\t*__checkpoint ⇢ org.springframework.web.filter.reactive.ServerHttpObservationFilter [DefaultWebFilterChain]\n\t*__checkpoint ⇢ HTTP GET \"/login/oauth2/code/test?code=***\" [ExceptionHandlingWebHandler]\nOriginal Stack Trace:\n\t\tat org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:136)\n\t\tat reactor.core.publisher.MonoErrorSupplied.subscribe(MonoErrorSupplied.java:55)\n\t\tat reactor.core.publisher.Mono.subscribe(Mono.java:4495)\n\t\tat reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:103)\n\t\tat reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)\n\t\tat reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)\n\t\tat reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)\n\t\tat reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93)\n\t\tat reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onError(MonoFlatMapMany.java:204)\n\t\tat reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)\n\t\tat reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.whenError(FluxRetryWhen.java:225)\n\t\tat reactor.core.publisher.FluxRetryWhen$RetryWhenOtherSubscriber.onError(FluxRetryWhen.java:274)\n\t\tat reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onError(FluxContextWrite.java:121)\n\t\tat reactor.core.publisher.FluxConcatMapNoPrefetch$FluxConcatMapNoPrefetchSubscriber.maybeOnError(FluxConcatMapNoPrefetch.java:326)\n\t\tat reactor.core.publisher.FluxConcatMapNoPrefetch$FluxConcatMapNoPrefetchSubscriber.onNext(FluxConcatMapNoPrefetch.java:211)\n\t\tat reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107)\n\t\tat reactor.core.publisher.SinkManyEmitterProcessor.drain(SinkManyEmitterProcessor.java:471)\n\t\tat reactor.core.publisher.SinkManyEmitterProcessor$EmitterInner.drainParent(SinkManyEmitterProcessor.java:615)\n\t\tat reactor.core.publisher.FluxPublish$PubSubInner.request(FluxPublish.java:873)\n\t\tat reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.request(FluxContextWrite.java:136)\n\t\tat reactor.core.publisher.FluxConcatMapNoPrefetch$FluxConcatMapNoPrefetchSubscriber.request(FluxConcatMapNoPrefetch.java:336)\n\t\tat reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.request(FluxContextWrite.java:136)\n\t\tat reactor.core.publisher.Operators$DeferredSubscription.request(Operators.java:1717)\n\t\tat reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onError(FluxRetryWhen.java:192)\n\t\tat reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:201)\n\t\tat reactor.netty.http.client.HttpClientConnect$HttpObserver.onUncaughtException(HttpClientConnect.java:403)\n\t\tat reactor.netty.ReactorNetty$CompositeConnectionObserver.onUncaughtException(ReactorNetty.java:703)\n\t\tat reactor.netty.resources.DefaultPooledConnectionProvider$DisposableAcquire.onUncaughtException(DefaultPooledConnectionProvider.java:223)\n\t\tat reactor.netty.resources.DefaultPooledConnectionProvider$PooledConnection.onUncaughtException(DefaultPooledConnectionProvider.java:476)\n\t\tat reactor.netty.channel.FluxReceive.drainReceiver(FluxReceive.java:247)\n\t\tat reactor.netty.channel.FluxReceive.onInboundError(FluxReceive.java:468)\n\t\tat reactor.netty.channel.ChannelOperations.onInboundError(ChannelOperations.java:515)\n\t\tat reactor.netty.channel.ChannelOperationsHandler.exceptionCaught(ChannelOperationsHandler.java:145)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:317)\n\t\tat io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireExceptionCaught(CombinedChannelDuplexHandler.java:424)\n\t\tat io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:92)\n\t\tat io.netty.channel.CombinedChannelDuplexHandler$1.fireExceptionCaught(CombinedChannelDuplexHandler.java:145)\n\t\tat io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:143)\n\t\tat io.netty.channel.CombinedChannelDuplexHandler.exceptionCaught(CombinedChannelDuplexHandler.java:231)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:317)\n\t\tat io.netty.handler.ssl.SslHandler.exceptionCaught(SslHandler.java:1204)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:317)\n\t\tat io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)\n\t\tat io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325)\n\t\tat io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907)\n\t\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:125)\n\t\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:177)\n\t\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)\n\t\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)\n\t\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)\n\t\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\t\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\t\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\t\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\t\tat java.base/java.lang.Thread.run(Thread.java:840)\nCaused by: java.net.SocketException: Connection reset\n\tat java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394)\n\tat java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426)\n\tat io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:255)\n\tat io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)\n\tat io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\n"}

Expected behavior

Successful login every time we use SSO

Your installation details

  1. 4de0d53

kafka: clusters: - name: test bootstrapServers: "***" logging: level: "debug" io.kafbat.ui: DEBUG org.springframework.http.codec.json.Jackson2JsonEncoder: DEBUG org.springframework.http.codec.json.Jackson2JsonDecoder: DEBUG reactor.netty.http.server.AccessLog: DEBUG org.springframework.security: DEBUG auth: type: OAUTH2 oauth2: client: test: clientId: "***" clientSecret: "***" scope: ["openid", "email", "groups"] client-name: github authorization-grant-type: authorization_code authorization-uri: https://***/auth redirect-uri: https://***/login/oauth2/code/test provider: "***" user-name-attribute: email token-uri: https://***/token issuer-uri: https://***/ custom-params: type: oauth roles-field: groups rbac: roles:

Steps to reproduce

Since it happens really randomly it is hard to provide steps to reproduce, it happens sometimes when using SSO with custom OIDC and Github. Tried using older versions as well, with every version it happened.

Screenshots

No response

Logs

No response

Additional context

No response

ognjenVlad avatar Jul 08 '24 14:07 ognjenVlad

Hi ognjenVlad! 👋

Welcome, and thank you for opening your first issue in the repo!

Please wait for triaging by our maintainers.

As development is carried out in our spare time, you can support us by sponsoring our activities or even funding the development of specific issues. Sponsorship link

If you plan to raise a PR for this issue, please take a look at our contributing guide.

github-actions[bot] avatar Jul 08 '24 14:07 github-actions[bot]

which oidc/oauth provider do you use?

Haarolean avatar Jul 12 '24 15:07 Haarolean

Dex https://github.com/dexidp/dex, which doesn't log any errors, authentication is succesful. Just 500 Connection reset occurs on kafbat side.

ognjenVlad avatar Jul 15 '24 08:07 ognjenVlad

Could you please try adding this env var SERVER_REACTIVE_SESSION_TIMEOUT: "86400" and observing if there are any changes?

Haarolean avatar Jul 16 '24 05:07 Haarolean

It is still happening, but seems like it is harder to reproduce if that makes sense. Thanks

ognjenVlad avatar Jul 17 '24 11:07 ognjenVlad

We'd need a minimal reproducible example to be able to reproduce and fix this (if there's anything to fix). Feel free to use our keycloak setup example if needed.

Haarolean avatar Aug 01 '24 18:08 Haarolean

Further user feedback is requested. Please reply within 7 days or we might close the issue.

kapybro[bot] avatar Aug 01 '24 18:08 kapybro[bot]

No feedback received within 7 days. Auto closing.

kapybro[bot] avatar Aug 08 '24 18:08 kapybro[bot]

This looks like your authorization server might be behind some LoadBalancer which can silently drop idle connections after timeout. For example, behind AWS NLB, which has timeout of 350 seconds. This is a known problem (e.g. see https://github.com/reactor/reactor-netty/issues/1774) with clients which will pool TCP connections like netty which is used in kafka-ui.

In this case you can try to fix it by instructing netty to remove pooled connections before timeout. E.g. by setting env var JAVA_OPTS : -Dreactor.netty.pool.maxIdleTime=30000 -Dreactor.netty.pool.maxLifeTime=60000

sashaozz avatar Sep 11 '24 05:09 sashaozz

This actually fixed it! @sashaozz Thank you very much

ognjenVlad avatar Oct 10 '24 12:10 ognjenVlad