librdkafka icon indicating copy to clipboard operation
librdkafka copied to clipboard

Consumer hangs when session times out while using static groups

Open mandeep39 opened this issue 4 years ago • 0 comments

Description

I am using node-rdkafka which is a wrapper around librdkafka. I have set it up so that i use static group membership. The following error was observed:

Consumer group session timed out (in join-state started) after 10000 ms without a successful response from the group coordinator (broker 3, last error was
 Success

followed by

Fatal consumer error: Broker: Static consumer fenced by other consumer with same group.instance.id

Looking at https://github.com/edenhill/librdkafka/blob/master/src/rdkafka_cgrp.c#L4916 , if a timeout occurs it appears librdkafka attempts to rejoin the consumer group. As far as i understand, with static membership the consumer never explicitly leaves the consumer group. In that case would a heartbeat still be constantly sent to the group coordinator keeping an existing partition assignment alive. In doing so not allowing re-joins to proceed as there is already an assignment for a given static group instance id. Granted there may be network issues but i am curious if that would exhibit the behavior i am seeing and if there is anything defensive i can put in such as increasing the heartbeat interval.

How to reproduce

Set up a node-rdkafka application to connect to a broker using static group membership. Settings are shown below for the client.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • [x] librdkafka version (release number or git tag): 1.6.1
  • [X] Apache Kafka version: 2.6.1
  • [X] librdkafka client configuration: group.instance.id,session.timeout.ms=10ms;heartbeat.interval.ms=3ms
  • [X] Operating system: RHEL Linux 8.3

mandeep39 avatar Jun 18 '21 16:06 mandeep39