starrocks-kubernetes-operator
starrocks-kubernetes-operator copied to clipboard
The FE Leader keeps reporting an UnknownHostException exception
Describe the bug
If the number of replicas for CN or BE is reduced without performing a DROP operation, the FE Leader will continuously report the following error:
- For each BE or CN node that is not DROPPED, the following error will be reported.
- The error occupies approximately 5KB of space.
- Such an error is outputted every five seconds.
- The FE logs will become unreadable. Each CN that is not DROPPED will result in the FE generating 24 * 10 * 60 * 5KB = 70MB of logs per day.
2024-06-24 10:53:25.176+08:00 WARN (heartbeat mgr|14) [HeartbeatMgr.runAfterCatalogReady():165] get bad heartbeat response: type: BACKEND, status: BAD, msg: java.net.UnknownHostException: kube-starrocks-cn-0.kube
-starrocks-cn-search.starrocks.svc.cluster.local
Jun 24, 2024 10:53:25 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<16218>: (kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local:9070)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host kube-starrocks
-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local, cause=java.lang.RuntimeException: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local: Name or servi
ce not known
at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:223)
at io.grpc.internal.DnsNameResolver.doResolve(DnsNameResolver.java:282)
at io.grpc.grpclb.GrpclbNameResolver.doResolve(GrpclbNameResolver.java:63)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:318)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local: Name or service not known
at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:930)
at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1543)
at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)
at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1533)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1386)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1307)
at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:631)
at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:219)
... 6 more
}
2024-06-24 10:53:25.235+08:00 WARN (starmgr-heartbeatmgr-0|100) [StarletAgent.heartbeat():94] caught GRPC exception when sending heartbeat to worker kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.clus
ter.local:9070, io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local.
2024-06-24 10:53:25.236+08:00 WARN (starmgr-heartbeatmgr-0|100) [StarletAgent.heartbeat():110] sending heartbeat to worker kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local:9070 failed, GRP
C:UNAVAILABLE: Unable to resolve host kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local.
^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A2024-06-24 10:53:30.191+08:00 WARN (heartbeat-mgr-pool-4|201) [HeartbeatMgr$BackendHeartbeatHandler.call():321] backend heartbeat got exception, addr: kube-star
rocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local:9050
org.apache.thrift.transport.TTransportException: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local
at org.apache.thrift.transport.TSocket.open(TSocket.java:226) ~[libthrift-0.13.0.jar:0.13.0]
at com.starrocks.common.GenericPool$ThriftClientFactory.create(GenericPool.java:148) ~[starrocks-fe.jar:?]
at com.starrocks.common.GenericPool$ThriftClientFactory.create(GenericPool.java:133) ~[starrocks-fe.jar:?]
at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62) ~[commons-pool2-2.3.jar:2.3]
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1036) ~[commons-pool2-2.3.jar:2.3]
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:356) ~[commons-pool2-2.3.jar:2.3]
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:278) ~[commons-pool2-2.3.jar:2.3]
at com.starrocks.common.GenericPool.borrowObject(GenericPool.java:101) ~[starrocks-fe.jar:?]
at com.starrocks.system.HeartbeatMgr$BackendHeartbeatHandler.call(HeartbeatMgr.java:270) ~[starrocks-fe.jar:?]
at com.starrocks.system.HeartbeatMgr$BackendHeartbeatHandler.call(HeartbeatMgr.java:256) ~[starrocks-fe.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:229) ~[?:?]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
at java.net.Socket.connect(Socket.java:609) ~[?:?]
at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[libthrift-0.13.0.jar:0.13.0]
... 13 more
Expected behavior
Operator should control whether to DROP BE/CN in a proper way.
Please complete the following information
- Operator Version: v1.9.6