starrocks-kubernetes-operator icon indicating copy to clipboard operation
starrocks-kubernetes-operator copied to clipboard

‌‌‌‌‌The FE Leader keeps reporting an UnknownHostException exception

Open yandongxiao opened this issue 8 months ago • 0 comments

Describe the bug

If the number of replicas for CN or BE is reduced without performing a DROP operation, the FE Leader will continuously report the following error:

  1. For each BE or CN node that is not DROPPED, the following error will be reported.
  2. The error occupies approximately 5KB of space.
  3. Such an error is outputted every five seconds.
  4. The FE logs will become unreadable. Each CN that is not DROPPED will result in the FE generating 24 * 10 * 60 * 5KB = 70MB of logs per day.
2024-06-24 10:53:25.176+08:00 WARN (heartbeat mgr|14) [HeartbeatMgr.runAfterCatalogReady():165] get bad heartbeat response: type: BACKEND, status: BAD, msg: java.net.UnknownHostException: kube-starrocks-cn-0.kube
-starrocks-cn-search.starrocks.svc.cluster.local
Jun 24, 2024 10:53:25 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<16218>: (kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local:9070)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host kube-starrocks
-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local, cause=java.lang.RuntimeException: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local: Name or servi
ce not known
        at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:223)
        at io.grpc.internal.DnsNameResolver.doResolve(DnsNameResolver.java:282)
        at io.grpc.grpclb.GrpclbNameResolver.doResolve(GrpclbNameResolver.java:63)
        at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:318)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local: Name or service not known
        at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
        at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:930)
        at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1543)
        at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)
        at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1533)
        at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1386)
        at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1307)
        at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:631)
        at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:219)
        ... 6 more
}
2024-06-24 10:53:25.235+08:00 WARN (starmgr-heartbeatmgr-0|100) [StarletAgent.heartbeat():94] caught GRPC exception when sending heartbeat to worker kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.clus
ter.local:9070, io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local.
2024-06-24 10:53:25.236+08:00 WARN (starmgr-heartbeatmgr-0|100) [StarletAgent.heartbeat():110] sending heartbeat to worker kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local:9070 failed, GRP
C:UNAVAILABLE: Unable to resolve host kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local.
^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A2024-06-24 10:53:30.191+08:00 WARN (heartbeat-mgr-pool-4|201) [HeartbeatMgr$BackendHeartbeatHandler.call():321] backend heartbeat got exception, addr: kube-star
rocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local:9050
org.apache.thrift.transport.TTransportException: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local
        at org.apache.thrift.transport.TSocket.open(TSocket.java:226) ~[libthrift-0.13.0.jar:0.13.0]
        at com.starrocks.common.GenericPool$ThriftClientFactory.create(GenericPool.java:148) ~[starrocks-fe.jar:?]
        at com.starrocks.common.GenericPool$ThriftClientFactory.create(GenericPool.java:133) ~[starrocks-fe.jar:?]
        at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62) ~[commons-pool2-2.3.jar:2.3]
        at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1036) ~[commons-pool2-2.3.jar:2.3]
        at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:356) ~[commons-pool2-2.3.jar:2.3]
        at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:278) ~[commons-pool2-2.3.jar:2.3]
        at com.starrocks.common.GenericPool.borrowObject(GenericPool.java:101) ~[starrocks-fe.jar:?]
        at com.starrocks.system.HeartbeatMgr$BackendHeartbeatHandler.call(HeartbeatMgr.java:270) ~[starrocks-fe.jar:?]
        at com.starrocks.system.HeartbeatMgr$BackendHeartbeatHandler.call(HeartbeatMgr.java:256) ~[starrocks-fe.jar:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:229) ~[?:?]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
        at java.net.Socket.connect(Socket.java:609) ~[?:?]
        at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[libthrift-0.13.0.jar:0.13.0]
        ... 13 more

Expected behavior

Operator should control whether to DROP BE/CN in a proper way.

Please complete the following information

  • Operator Version: v1.9.6

yandongxiao avatar Jun 24 '24 03:06 yandongxiao