Burrow icon indicating copy to clipboard operation
Burrow copied to clipboard

More graceful ErrOffsetOutOfRange handling

Open tormoder opened this issue 5 years ago • 1 comments

Go version: 1.14.2 Burrow version: github.com/linkedin/Burrow v1.3.4-0.20200506150011-4ce194fea01a

We have lately been seeing (seemingly random) ErrOffsetOutOfRange errors for two specific partitions of the __consumer_offsets topic and the internal Burrow consumer. This has started happening after upgrading our cluster to Kafka version 2.4.

We have not found the reason for this, and it may be related to our cluster setup. But the error, ErrOffsetOutOfRange, causes a nil-pointer in Burrow after the error is logged.

{"level":"ERROR","@timestamp":"2020-05-27T08:14:04.090Z","caller":"runtime/asm_amd64.s:1373","message":"consume error","@version":"1","type":"module","coordinator":"consumer","class":"kafka","name":"prod","topic":"__consumer_offsets","partition":9,"error":"kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition.","stacktrace":"github.com/linkedin/Burrow/core/internal/consumer.(*KafkaClient).partitionConsumer\n\t/go/pkg/mod/github.com/linkedin/[email protected]/core/internal/consumer/kafka_client.go:261"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x9ca109]

goroutine 393 [running]:
github.com/linkedin/Burrow/core/internal/consumer.(*KafkaClient).partitionConsumer(0xc00042edc0, 0xdf2b40, 0xc000ddb4a0, 0x0)
        /go/pkg/mod/github.com/linkedin/[email protected]/core/internal/consumer/kafka_client.go:262 +0x5e9
created by github.com/linkedin/Burrow/core/internal/consumer.(*KafkaClient).startKafkaConsumer
        /go/pkg/mod/github.com/linkedin/[email protected]/core/internal/consumer/kafka_client.go:314 +0x968

The underlying Sarama partition consumer considers the error fatal, and require user action: https://github.com/Shopify/sarama/blob/b5764af1c47d0f6718dba3be6a3d75e8c97b351a/consumer.go#L826-L828

I would be nice if Burrow could handle the error more gracefully, and possibly restart the partition consumer at OffsetNewest.

tormoder avatar May 27 '20 09:05 tormoder

It looks like we're hitting https://issues.apache.org/jira/browse/KAFKA-9543, which is causing the ErrOffsetOutOfRange errors.

The question still remains if Burrow should handle the error more gracefully than crashing.

tormoder avatar May 28 '20 11:05 tormoder