confluent-kafka-dotnet
confluent-kafka-dotnet copied to clipboard
Confluent-kafka-dotnet - Consumer occasionally dies
Maximum application poll interval
this implies the time between calls to Consume
was longer than MaxPollTimeoutMs
, which is 5 minutes by default. Can your processing of each message take longer than this?
try upgrading to the latest version, if the problem persists we can investigate further.
there are significant fixes to the consumer post 1.7 which may be related, though this issue specifically is not familiar to me.
a possible work around would be to dispose and re-create the consumer periodically (say every hour).
you could also assign to the topic partition(s) directly - since your throughput is so low you don't have any need for a consumer group.
Hello, @mhowlett.
We're facing exactly the same error mentioned in this issue & also in https://github.com/confluentinc/confluent-kafka-dotnet/issues/1228. We are using latest confluent kafka (1.9.3) and our apps in production just randomly stuck from time to time. We have N pods of a service deployed to k8s and working with same consumerGroup. After some period of time, suddenly one of the pods gets stuck in .Consume(cancellationToken) method.
From this picture we can see that we have a log right before .Consume(cancellationToken) logged at 19:04:40.
After 12s we got following error:
2022-11-17T21:04:52+02:00 %4|1668711892.043|MAXPOLL|rdkafka#consumer-6| [thrd:main]: Application maximum poll interval (10000ms) exceeded by 247ms (adjust max.poll.interval.ms for long-running message processing): leaving group
Our max.poll.interval.ms is set to 10s. So after that time we suddenly got the poll error.
Here is a look at our code including the attached log.
So, we do reach that point, our cancellationToken has no cancellation requested, but the consumer for some reason doesn't poll at all. According to the summary of the consume method, it's supposed to poll until it receives a message:
Could you please advice on how to fix this? The worst part is this error from above cannot be caught in try/catch, therefore, we cannot just rebuild & re-subscribe our consumer.
@mshahins This should not be considered as a fatal error. One the subsequent Consume()
call the consumer will join the group and will resume the consumption.
@Plotso Can you provide debug logs and your consumer config?
@anchitj @mhowlett - Hi I want to delete this issue permanently. Can you please delete this issue? Or Let me know how can I reach out to the admin of this repo to delete this issue. Thanks
Hi @mshahins, Did your issue for resolved? If yes it will help others the solution