LoneKing

Results 7 comments of LoneKing

1. 所在机器用kafka-consumer-groups.sh查看信息2S左右就显示出内容 KnowStreaming所在机器用kafkactl连接集群也是2S左右显示出信息 都在一个云 地域内 有一点疑问是,从操作感知上,这个报错是瞬间的,前端点击过箭头按钮是立即弹出错误信息,没那个超时等待的感觉

> > > > 具体的接口是?然后对应的错误日志是? 接口/ks-km/api/v3/clusters/1/topics/xxxxxxxxxx/groups/xxxxxxxxxxxxxxx/metric ![image](https://github.com/didi/KnowStreaming/assets/11244921/80b116cc-0a7c-4466-a2cb-0d767e487710) `2023-07-19 14:58:40.883 [MetricCollect-Shard-1-9-thread-79] ERROR class=c.x.k.s.k.core.service.group.impl.GroupServiceImpl||method=getGroupOffset||clusterPh yId=1|groupName=wangyou_transport||errMsg=exception! java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=findCoordinator, deadlineMs=1689749980882, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s) at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)...

> > MetricCollect-Shard-1-9-thread-79 > > 1、这个日志不对,这个是采集线程的日志,前端页面是http请求,线程堆栈不是这样的。 2、这个地方出现了:The AdminClient thread has exited 的日志,看看有没有什么日志显示关闭了AdminClient。 ### 1.API调用后的错误信息如下 **API的方法里我加了测试日志,当调用到GroupManagerImpl.pagingGroupTopicConsumedMetrics方法中的groupService.getGroupOffsetFromKafka时,是立即抛出异常报错的,没感觉到有什么什么延时、超时** 2023-07-26 18:07:26.829 [ApiCallTP-4-thread-2] ERROR class=c.x.k.s.k.core.service.group.impl.GroupServiceImpl||method=getGroupOffset||clusterPhyId=1|groupName=integration_payment_logs||errMsg=exception! java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=findCoordinator, deadlineMs=1690366106829, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807...

### 1.info日志没看到,所以我单独又加了日志配置 我给KafkaAdminClient的**remove**方法单独加了日志,**createKafkaAdminClient**也单独加了日志 create的是都有,但是remove方法没被调用过,报The AdminClient thread has exited错误信息后也没见remove被调用 ### 2. jstack信息 [jstack1.txt](https://github.com/didi/KnowStreaming/files/12182162/jstack1.txt) ### 又看详细日志定位报错时机,是在 GroupServiceImpl中getGroupOffsetFromKafka方法的这里 partitionsToOffsetAndMetadata().get()一被调用就会瞬间报错 `Map offsetAndMetadataMap = listConsumerGroupOffsetsResult.partitionsToOffsetAndMetadata().get(); ` ### 接口调用后的日志,异常信息如下 2023-07-27 18:25:12.216 [http-nio-8080-exec-1] INFO Test - start...

> > > > 辛苦给一份kafka客户端,在创建KafkaAdminClient前和出现The AdminClient thread has exited日志后的debug日志吧。 test.log 2023-07-28 10:13:29.787 是api请求开始 2023-07-28 10:13:29.789 是The AdminClient thread has exited. [kafka_client.log](https://github.com/didi/KnowStreaming/files/12192227/kafka_client.log) [test.2023-07-28.log](https://github.com/didi/KnowStreaming/files/12192231/test.2023-07-28.log)

> > > > > > > > > > > > > 辛苦给一份kafka客户端,在创建KafkaAdminClient前和出现The AdminClient thread has exited日志后的debug日志吧。 > > > > > > test.log 2023-07-28 10:13:29.787 是api请求开始 2023-07-28 10:13:29.789...

> > Uncaught exception in thread > > 1、应该是这个原因导致的线程退出了,最终导致客户端被认为是关闭了,后续我看一下怎么修复。 > > ``` > 2023-08-03 11:14:20.488 [kafka-admin-client-thread | ApacheAdminClient||clusterPhyId=1||Cnt=0] ERROR class=org.apache.kafka.common.utils.KafkaThread||Uncaught exception in thread 'kafka-admin-client-thread | ApacheAdminClient||clusterPhyId=1||Cnt=0': > java.lang.RuntimeException: non-nullable field...