Consumers in a consumer group stuck for 4 days after ErrOffsetOutOfRange error
Description
We recently noticed in our staging and prod environment that consumer groups got stuck for more than 4 days and not consuming messages from the partition. After restarting the pods it started working again.
Related Issue: #2682
Versions
| Sarama | Kafka | Go |
|---|---|---|
| 1.42.1 | 3.4.1 | 1.21 |
Configuration
cfg.Consumer.Group.Rebalance.GroupStrategies = []sarama.BalanceStrategy{sarama.NewBalanceStrategyRoundRobin()}
cfg.Consumer.Offsets.Initial = sarama.OffsetNewest
cfg.Consumer.Group.Session.Timeout = time.Second * time.Duration(SESSION_TIMEOUT)
cfg.Consumer.Group.Heartbeat.Interval = time.Second * time.Duration(CONSUMER_HEARTBEAT)
cfg.Consumer.Return.Errors = true
cfg.Consumer.Fetch.Min = 100 * 1024 // 100 KB
cfg.Consumer.Fetch.Default = 2 * 1024 * 1024 // 2 MB
Logs
[ERRR] 2024/04/04 17:45:01 [Sarama Consumer Error]: kafka: error while consuming results.default/5:
kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:44:47 [Sarama Consumer Error]: kafka: error while consuming results.default/1: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:44:38 [Sarama Consumer Error]: kafka: error while consuming results.default/13: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:44:37 [Sarama Consumer Error]: kafka: error while consuming results.default/11: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:43:56 [Sarama Consumer Error]: kafka: error while consuming results.default/6: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:43:28 [Sarama Consumer Error]: kafka: error while consuming results.default/3: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:39:16 [Sarama Consumer Error]: kafka: error while consuming results.default/0: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:39:13 [Sarama Consumer Error]: kafka: error while consuming results.default/14: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:39:03 [Sarama Consumer Error]: kafka: error while consuming results.default/4: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:38:51 [Sarama Consumer Error]: kafka: error while consuming results.default/12: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:38:42 [Sarama Consumer Error]: kafka: error while consuming results.default/10: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 17:38:34 [Sarama Consumer Error]: kafka: error while consuming results.default/2: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 16:45:02 [Sarama Consumer Error]: kafka: error while consuming results.default/8: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 16:44:53 [Sarama Consumer Error]: kafka: error while consuming results.default/15: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 16:44:07 [Sarama Consumer Error]: kafka: error while consuming results.default/9: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[ERRR] 2024/04/04 16:44:02 [Sarama Consumer Error]: kafka: error while consuming results.default/7: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
Additional Context
Faced the same behaviour with this error as well Request exceeded the user-specified time limit in the request
@dnwe can you help with this issue? Again faced the same problem, consumers getting stuck for longer period of time.
@shubham-dogra-s1 👋🏻 thanks for getting in touch
The first thing to double check would be your consumer group lag vs the topic retention. If the group is too far behind in committed offset to keep up with the retention, then it is possible the log has been truncated and your client is trying to consume from an older offset that no longer exists
@dnwe yes that is possible.
I can see that in the lib code we are resetting the offset if we got ErrOffsetOutOfRange error https://github.com/IBM/sarama/blob/4ad35041300e1c15ba58b630745ac8eb05f30c10/consumer_group.go#L1123 Even though it is handled but still resulting in infinite loop somehow
But we recently faced the same issue with another error Request exceeded the user-specified time limit in the request.I guess same thing is happening here as well. Attached logs below error while consuming results.priority/53: read tcp i/o timeout. Consumer is trying to consume an offset (53) and request timed out (possibly due to offset no longer available)
On restarting the pods, consumers starting working again.
Attaching some more logs regarding Request exceeded the user-specified time limit in the request for easier debugging
Logs from client
[ERRR][Sarama Consumer Error]: kafka: error while consuming results.priority/53: read tcp i/o timeout
[ERRR][Sarama Consumer Error]: kafka: error while consuming results.priority/0: read tcp i/o timeout
[ERRR][Sarama Consumer Error]: kafka: error while consuming results.on_demand/49: read tcp i/o timeout
[ERRR][Sarama Consumer Error]: kafka: error while consuming results.on_demand/41: read tcp i/o timeout
[ERRR][Sarama Consumer Error]: kafka: error while consuming results.on_demand/0: read tcp i/o timeout
Kafka Exporter Logs
E0411 07:12:22.868864 1 kafka_exporter.go:598] Cannot get offset of group results.on_demand tcp i/o timeout
Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.