spring-cloud-alibaba
spring-cloud-alibaba copied to clipboard
Running service occasional prompts “com.alibaba.nacos.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception”
主要依赖及版本: spring-cloud-gateway-3.1.4 spring-cloud-starter-alibaba-nacos-discovery-2021.0.4 spring-cloud-starter-alibaba-nacos-config-2021.0.4 nacos-client-2.1.2 nacos-server-2.1.0
场景:gateway服务能正常启动,但是偶尔会有com.alibaba.nacos.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception异常,8848、9848端口都是正常的
具体异常堆栈信息: 2024-01-15 08:02:52.739 ERROR {"appName":"bigdata-di","thread":"nacos-grpc-client-executor-myj-hngz-prod-cse-mdszh2.nacos.cse.com-31165","className":"com.alibaba.nacos.common.utils.LoggerUtils","methodName":"printIfErrorEnabled","codeLine":"102"}|-> [1705274581653_198.19.131.62_3051]Request stream error, switch server,error={} com.alibaba.nacos.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception at com.alibaba.nacos.shaded.io.grpc.Status.asRuntimeException(Status.java:539) at com.alibaba.nacos.shaded.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:487) at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:563) at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:744) at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:723) at com.alibaba.nacos.shaded.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at com.alibaba.nacos.shaded.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:258) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ... 1 common frames omitted 2024-01-15 08:02:52.740 INFO {"appName":"bigdata-di","thread":"com.alibaba.nacos.client.remote.worker","className":"com.alibaba.nacos.common.utils.LoggerUtils","methodName":"printIfInfoEnabled","codeLine":"63"}|-> [095cfdd8-092c-4e52-8707-23a05612aef6] Try to reconnect to a new server, server is not appointed, will choose a random server. 2024-01-15 08:02:52.742 INFO {"appName":"bigdata-di","thread":"com.alibaba.nacos.client.remote.worker","className":"com.alibaba.nacos.common.remote.client.grpc.GrpcClient","methodName":"createNewManagedChannel","codeLine":"182"}|-> grpc client connection server:xxx.xxx.xx.xx ip,serverPort:9848,grpcTslConfig:{"sslProvider":"","enableTls":false,"mutualAuthEnable":false,"trustAll":false} 2024-01-15 08:02:52.887 INFO {"appName":"bigdata-di","thread":"com.alibaba.nacos.client.remote.worker","className":"com.alibaba.nacos.common.utils.LoggerUtils","methodName":"printIfInfoEnabled","codeLine":"63"}|-> [095cfdd8-092c-4e52-8707-23a05612aef6] Success to connect a server [xxx.xxx.xx.xx:8848], connectionId = 1705276972764_198.19.130.14_1073
求助:现在不知道定位方向,帮忙指导下,谢谢
网络波动问题?排查下 nacos 9848 端口看看
如果是容器部署的话,可以尝试把 9848 9849 8848 端口都放开看看(nacos 的端口偏移量是 1000 和 1001)
可以提供一个复现 demo,本地不好复现没办法验证问题
容器部署的,9848 9849 8848 端口都放开了,本地确实不好复现,这个异常是偶尔发生的,主要是看到了Caused by: java.io.IOException: Connection reset by peer这个提示,感觉是服务端或者客户端的某一方主动关闭了连接,怀疑是不是客户端和服务端的不匹配导致的呢?这两个版本是否存在兼容性问题? nacos-client-2.1.2 nacos-server-2.1.0
容器部署的,9848 9849 8848 端口都放开了,本地确实不好复现,这个异常是偶尔发生的,主要是看到了Caused by: java.io.IOException: Connection reset by peer这个提示,感觉是服务端或者客户端的某一方主动关闭了连接,怀疑是不是客户端和服务端的不匹配导致的呢?这两个版本是否存在兼容性问题? nacos-client-2.1.2 nacos-server-2.1.0
nacos-client-2.1.2 nacos-server-2.1.0 不是很确定有没有兼容性问题,你可以参考 sca 的推荐的版本组件 https://github.com/alibaba/spring-cloud-alibaba/wiki/%E7%89%88%E6%9C%AC%E8%AF%B4%E6%98%8E
可以试着从 spring-cloud-starter-alibaba-nacos-discovery中排除现有的 nacos-client 依赖,引入和 nacos-server 一致的看看
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId>
<exclusions>
<exclusion>
<groupId>com.alibaba.nacos</groupId>
<artifactId>nacos-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.alibaba.nacos</groupId>
<artifactId>nacos-client</artifactId>
<version>${nacos.client}</version>
</dependency>
客户端与服务端通信中间是否有经过 Nginx 或其他代理转发
客户端与服务端通信中间是否有经过 Nginx 或其他代理转发
没有的,都是直接通过sdk连接的Nacos-server,本地又很难复现这个错误,但是部署到容器中就会偶尔有这个问题。
@ruansheng8 这个现象奇怪就在于没有规律,其他业务组件也没有类似的报错,就集成spring-cloud-gateway的这个组件偶尔报错一两次,有没有可能grpc跟gateway组件有什么冲突呢?不知道之前有没有类似的问题?
@chengyouling 本地可以用 Maven Helper 看看是否有依赖冲突,或者咨询一下 Nacos 社区之前是否有遇到过类似的情况。
This issue has been open 30 days with no activity. This will be closed in 7 days.