sarama icon indicating copy to clipboard operation
sarama copied to clipboard

Consumer group client does not back-off / retry for possible error scenarios

Open dotnwat opened this issue 4 years ago • 1 comments

Versions
Sarama Kafka Go
0189d59e5253ee7aa2a9eb50bffd18a69460630d 2.4
Configuration
config.Version = sarama.V2_4_0_0
config.Consumer.Return.Errors = true
config.Consumer.Offsets.Initial = sarama.OffsetNewest
Logs
2020/06/02 10:39:50 Initializing new client
2020/06/02 10:39:50 client/metadata fetching metadata for all topics from broker localhost:9092
2020/06/02 10:39:50 Connected to broker at localhost:9092 (unregistered)
2020/06/02 10:39:50 client/brokers registered new broker #1 at 0.0.0.0:9092
2020/06/02 10:39:50 Successfully initialized new client
2020/06/02 10:39:50 client/metadata fetching metadata for [sanfrancisco] from broker localhost:9092
2020/06/02 10:39:50 client/metadata found some partitions to be leaderless
2020/06/02 10:39:50 client/metadata retrying after 250ms... (3 attempts remaining)
2020/06/02 10:39:50 client/metadata fetching metadata for [sanfrancisco] from broker localhost:9092
2020/06/02 10:39:50 client/coordinator requesting coordinator for consumergroup sfo-consumer-group from localhost:9092
2020/06/02 10:39:50 client/coordinator coordinator for consumergroup sfo-consumer-group is #1 (0.0.0.0:9092)
2020/06/02 10:39:50 Connected to broker at 0.0.0.0:9092 (registered as #1)
2020/06/02 10:39:50 ProcessingLoop error:  kafka server: The broker is still loading offsets after a leader change for that offset's topic partition.
Problem Description

When calling Consume on a consumer group client the Kafka broker is returning the error COORDINATOR_LOAD_IN_PROGRESS = 14 when joining the group, but Sarama does not appear to implement the back-off / retry logic for this error condition. It only does back-off / retry for NOT_COORDINATOR = 16 (see here: https://github.com/Shopify/sarama/blob/master/consumer_group.go#L227).

This case is handled in kafka java reference client (see: https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L609).

Generally this error COORDINATOR_LOAD_IN_PROGRESS and NOT_COORDINATOR and COORDINATOR_NOT_AVAILABLE are all valid errors from any of the group membership APIs.

Sarama handles the load in progress condition for fetching offsets, but when we experience this error via join group, the client doesn't retry as expected.

dotnwat avatar Jun 02 '20 18:06 dotnwat

Closing as dupe of https://github.com/Shopify/sarama/issues/2058 believed to be fixed (via https://github.com/Shopify/sarama/pull/2214) in v1.33.0 and newer

dnwe avatar Dec 02 '22 10:12 dnwe

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

github-actions[bot] avatar Aug 24 '23 20:08 github-actions[bot]