confluent-kafka-dotnet icon indicating copy to clipboard operation
confluent-kafka-dotnet copied to clipboard

Confluent-kafka-dotnet - Consumer occasionally dies

Open mshahins opened this issue 2 years ago • 9 comments

mshahins avatar Oct 05 '22 13:10 mshahins

Maximum application poll interval

this implies the time between calls to Consume was longer than MaxPollTimeoutMs, which is 5 minutes by default. Can your processing of each message take longer than this?

mhowlett avatar Oct 05 '22 15:10 mhowlett

try upgrading to the latest version, if the problem persists we can investigate further.

mhowlett avatar Oct 06 '22 17:10 mhowlett

there are significant fixes to the consumer post 1.7 which may be related, though this issue specifically is not familiar to me.

mhowlett avatar Oct 07 '22 13:10 mhowlett

a possible work around would be to dispose and re-create the consumer periodically (say every hour).

mhowlett avatar Oct 28 '22 02:10 mhowlett

you could also assign to the topic partition(s) directly - since your throughput is so low you don't have any need for a consumer group.

mhowlett avatar Oct 28 '22 02:10 mhowlett

Hello, @mhowlett.

We're facing exactly the same error mentioned in this issue & also in https://github.com/confluentinc/confluent-kafka-dotnet/issues/1228. We are using latest confluent kafka (1.9.3) and our apps in production just randomly stuck from time to time. We have N pods of a service deployed to k8s and working with same consumerGroup. After some period of time, suddenly one of the pods gets stuck in .Consume(cancellationToken) method. image From this picture we can see that we have a log right before .Consume(cancellationToken) logged at 19:04:40. After 12s we got following error: 2022-11-17T21:04:52+02:00 %4|1668711892.043|MAXPOLL|rdkafka#consumer-6| [thrd:main]: Application maximum poll interval (10000ms) exceeded by 247ms (adjust max.poll.interval.ms for long-running message processing): leaving group Our max.poll.interval.ms is set to 10s. So after that time we suddenly got the poll error. Here is a look at our code including the attached log. image image

So, we do reach that point, our cancellationToken has no cancellation requested, but the consumer for some reason doesn't poll at all. According to the summary of the consume method, it's supposed to poll until it receives a message: image

Could you please advice on how to fix this? The worst part is this error from above cannot be caught in try/catch, therefore, we cannot just rebuild & re-subscribe our consumer.

Plotso avatar Nov 17 '22 21:11 Plotso

@mshahins This should not be considered as a fatal error. One the subsequent Consume() call the consumer will join the group and will resume the consumption.

@Plotso Can you provide debug logs and your consumer config?

anchitj avatar Nov 30 '22 08:11 anchitj

@anchitj @mhowlett - Hi I want to delete this issue permanently. Can you please delete this issue? Or Let me know how can I reach out to the admin of this repo to delete this issue. Thanks

mshahins avatar Mar 09 '23 16:03 mshahins

Hi @mshahins, Did your issue for resolved? If yes it will help others the solution

ksdvishnukumar avatar Oct 06 '23 16:10 ksdvishnukumar