spring-cloud-alibaba icon indicating copy to clipboard operation
spring-cloud-alibaba copied to clipboard

Running service occasional prompts “com.alibaba.nacos.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception”

Open chengyouling opened this issue 1 year ago • 7 comments

主要依赖及版本: spring-cloud-gateway-3.1.4 spring-cloud-starter-alibaba-nacos-discovery-2021.0.4 spring-cloud-starter-alibaba-nacos-config-2021.0.4 nacos-client-2.1.2 nacos-server-2.1.0

场景:gateway服务能正常启动,但是偶尔会有com.alibaba.nacos.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception异常,8848、9848端口都是正常的

具体异常堆栈信息: 2024-01-15 08:02:52.739 ERROR {"appName":"bigdata-di","thread":"nacos-grpc-client-executor-myj-hngz-prod-cse-mdszh2.nacos.cse.com-31165","className":"com.alibaba.nacos.common.utils.LoggerUtils","methodName":"printIfErrorEnabled","codeLine":"102"}|-> [1705274581653_198.19.131.62_3051]Request stream error, switch server,error={} com.alibaba.nacos.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception at com.alibaba.nacos.shaded.io.grpc.Status.asRuntimeException(Status.java:539) at com.alibaba.nacos.shaded.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:487) at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:563) at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:744) at com.alibaba.nacos.shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:723) at com.alibaba.nacos.shaded.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at com.alibaba.nacos.shaded.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:258) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at com.alibaba.nacos.shaded.io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ... 1 common frames omitted 2024-01-15 08:02:52.740 INFO {"appName":"bigdata-di","thread":"com.alibaba.nacos.client.remote.worker","className":"com.alibaba.nacos.common.utils.LoggerUtils","methodName":"printIfInfoEnabled","codeLine":"63"}|-> [095cfdd8-092c-4e52-8707-23a05612aef6] Try to reconnect to a new server, server is not appointed, will choose a random server. 2024-01-15 08:02:52.742 INFO {"appName":"bigdata-di","thread":"com.alibaba.nacos.client.remote.worker","className":"com.alibaba.nacos.common.remote.client.grpc.GrpcClient","methodName":"createNewManagedChannel","codeLine":"182"}|-> grpc client connection server:xxx.xxx.xx.xx ip,serverPort:9848,grpcTslConfig:{"sslProvider":"","enableTls":false,"mutualAuthEnable":false,"trustAll":false} 2024-01-15 08:02:52.887 INFO {"appName":"bigdata-di","thread":"com.alibaba.nacos.client.remote.worker","className":"com.alibaba.nacos.common.utils.LoggerUtils","methodName":"printIfInfoEnabled","codeLine":"63"}|-> [095cfdd8-092c-4e52-8707-23a05612aef6] Success to connect a server [xxx.xxx.xx.xx:8848], connectionId = 1705276972764_198.19.130.14_1073

求助:现在不知道定位方向,帮忙指导下,谢谢

chengyouling avatar Jan 17 '24 08:01 chengyouling

网络波动问题?排查下 nacos 9848 端口看看

如果是容器部署的话,可以尝试把 9848 9849 8848 端口都放开看看(nacos 的端口偏移量是 1000 和 1001)

可以提供一个复现 demo,本地不好复现没办法验证问题

yuluo-yx avatar Jan 18 '24 02:01 yuluo-yx

容器部署的,9848 9849 8848 端口都放开了,本地确实不好复现,这个异常是偶尔发生的,主要是看到了Caused by: java.io.IOException: Connection reset by peer这个提示,感觉是服务端或者客户端的某一方主动关闭了连接,怀疑是不是客户端和服务端的不匹配导致的呢?这两个版本是否存在兼容性问题? nacos-client-2.1.2 nacos-server-2.1.0

chengyouling avatar Jan 18 '24 02:01 chengyouling

容器部署的,9848 9849 8848 端口都放开了,本地确实不好复现,这个异常是偶尔发生的,主要是看到了Caused by: java.io.IOException: Connection reset by peer这个提示,感觉是服务端或者客户端的某一方主动关闭了连接,怀疑是不是客户端和服务端的不匹配导致的呢?这两个版本是否存在兼容性问题? nacos-client-2.1.2 nacos-server-2.1.0

nacos-client-2.1.2 nacos-server-2.1.0 不是很确定有没有兼容性问题,你可以参考 sca 的推荐的版本组件 https://github.com/alibaba/spring-cloud-alibaba/wiki/%E7%89%88%E6%9C%AC%E8%AF%B4%E6%98%8E

可以试着从 spring-cloud-starter-alibaba-nacos-discovery中排除现有的 nacos-client 依赖,引入和 nacos-server 一致的看看

<dependency>
	<groupId>com.alibaba.cloud</groupId>
	<artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId>
	<exclusions>
		<exclusion>
			<groupId>com.alibaba.nacos</groupId>
			<artifactId>nacos-client</artifactId>
		</exclusion>
	</exclusions>
</dependency>

<dependency>
	<groupId>com.alibaba.nacos</groupId>
	<artifactId>nacos-client</artifactId>
	<version>${nacos.client}</version>
</dependency>

yuluo-yx avatar Jan 18 '24 02:01 yuluo-yx

客户端与服务端通信中间是否有经过 Nginx 或其他代理转发

ruansheng8 avatar Jan 18 '24 12:01 ruansheng8

客户端与服务端通信中间是否有经过 Nginx 或其他代理转发

没有的,都是直接通过sdk连接的Nacos-server,本地又很难复现这个错误,但是部署到容器中就会偶尔有这个问题。

chengyouling avatar Jan 19 '24 06:01 chengyouling

@ruansheng8 这个现象奇怪就在于没有规律,其他业务组件也没有类似的报错,就集成spring-cloud-gateway的这个组件偶尔报错一两次,有没有可能grpc跟gateway组件有什么冲突呢?不知道之前有没有类似的问题?

chengyouling avatar Jan 22 '24 03:01 chengyouling

@chengyouling 本地可以用 Maven Helper 看看是否有依赖冲突,或者咨询一下 Nacos 社区之前是否有遇到过类似的情况。

ruansheng8 avatar Jan 23 '24 12:01 ruansheng8

This issue has been open 30 days with no activity. This will be closed in 7 days.

github-actions[bot] avatar Feb 22 '24 18:02 github-actions[bot]