kafkajs icon indicating copy to clipboard operation
kafkajs copied to clipboard

Consumer stuck in infinite loop during broker restart

Open SathishKumarRamasamy opened this issue 2 years ago • 3 comments

Hi Team,

Brief about our consumer set up:

  • kafka js version used: 1.12.0
  • We wanted to have exactly-once semantics when consuming from Kafka. So we manually commit first before we process the messages in eachBatch. We use the following code/config to achieve it: await batchPayload.resolveOffset(lastOffset); await batchPayload.commitOffsetsIfNecessary();. Also we set "autoCommitInterval": -1 ;

We faced a strange issue in our nodejs consumer when the kafka brokers were restarted. Consumer crash event occurred after exhausting all the retries and we do restart the consumer ourselves in such cases. The restart was fine but while manually committing to kafka, we faced below error: "This is not the current co ordinator of the group". In the event of error while committing , we don't process the messages, we just throw an exception from eachBatch. We kept on getting this error until we manually restart the whole application. This behavior was consistent in different landscapes for the same consumer.

Interestingly, we didn't have any issue another consumer with autoCommit enabled which was part of the same application.

Any leads would be really helpful.

Thanks, Sathish

SathishKumarRamasamy avatar Jul 20 '21 13:07 SathishKumarRamasamy

Hi @SathishKumarRamasamy, you should bump to the latest version first. There were several fixes regarding stale data, etc. I also suggest changing from autoCommitInterval=-1 to autoCommitThreshold=1.

tulios avatar Jul 20 '21 13:07 tulios

@tulios : Sure. Let me increase the version and observe. Further, We had configured autoCommitInterval=-1 to commit synchronously before we proceed to process the batch. Any reason why you are suggesting to set it as autoCommitThreshold=1?

SathishKumarRamasamy avatar Jul 21 '21 04:07 SathishKumarRamasamy

@tulios , @Nevon : We upgraded to 1.15.0. But I still observe this issue sporadically. I still retained "autoCommitInterval": -1 because I was doing it once in a batch and I would like to get it reflected immediately. Do you think that is the reason for this issue? Could you please give me some pointers to check?

SathishKumarRamasamy avatar Oct 13 '21 12:10 SathishKumarRamasamy