sarama icon indicating copy to clipboard operation
sarama copied to clipboard

Sarama has stuck on the batch with zero records length.

Open pachmu opened this issue 3 years ago • 5 comments

Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.

Sarama Kafka Go
Version 1.29.1 AWS Managed Kafka 1.17
Configuration

What configuration values are you using for Sarama and Kafka?

                 cfg.Net.MaxOpenRequests = 1
		// consume all messages
		cfg.Consumer.IsolationLevel = sarama.ReadUncommitted
		// require producer idempotency
		cfg.Producer.Idempotent = true
		// require consistency on Kafka side before considering the message produced
		cfg.Producer.RequiredAcks = sarama.WaitForAll
		// specify network keep-alive to reduce 'broken pipe' errors on non-KMS environments
		cfg.Net.KeepAlive = time.Second * 30
		// increase processing time from 1s (default) to 5s
		cfg.Consumer.MaxProcessingTime = time.Second * 5
Logs

When filing an issue please provide logs from Sarama and Kafka if at all possible. You can set sarama.Logger to a log.Logger to capture Sarama debug output.

logs: CLICK ME

2021/10/22 10:38:26.384689 lvl=INF component=kafka records batch &{5805968 49 2 snappy 0 false false 0 2021-09-30 11:45:34.617 +0200 CEST 2021-09-30 11:45:34.617 +0200 CEST 82609 0 1563 [0xc00030e150] false false [] 1326}
2021/10/22 10:38:26.384705 lvl=INF component=kafka records batch &{5806083 49 2 snappy 0 false false 0 2021-09-30 12:00:01.188 +0200 CEST 2021-09-30 12:00:01.188 +0200 CEST 82609 0 1678 [0xc00030e1c0] false false [] 733}
2021/10/22 10:38:26.384728 lvl=INF component=kafka records batch &{5806313 49 2 snappy 0 false false 0 2021-09-30 12:29:55.542 +0200 CEST 2021-09-30 12:29:55.542 +0200 CEST 82609 0 1908 [0xc00030e230] false false [] 10546}
2021/10/22 10:38:26.384751 lvl=INF component=kafka records batch &{5806399 49 2 snappy 0 false false 0 2021-09-30 12:42:07.755 +0200 CEST 2021-09-30 12:42:07.755 +0200 CEST 82609 0 1994 [0xc00030e2a0] false false [] 1130}
2021/10/22 10:38:26.384772 lvl=INF component=kafka records batch &{5806689 49 2 snappy 0 false false 0 2021-09-30 13:25:25.119 +0200 CEST 2021-09-30 13:25:25.119 +0200 CEST 82609 0 2284 [0xc00030e380] false false [] 2934}
2021/10/22 10:38:26.384790 lvl=INF component=kafka records batch &{5807057 49 2 snappy 0 false false 0 2021-09-30 14:24:12.229 +0200 CEST 2021-09-30 14:24:12.229 +0200 CEST 82609 0 2652 [0xc00030e3f0] false false [] 3124}
2021/10/22 10:38:26.384807 lvl=INF component=kafka records batch &{5807106 49 2 snappy 0 false false 0 2021-09-30 14:31:33.337 +0200 CEST 2021-09-30 14:31:33.337 +0200 CEST 82609 0 2701 [0xc00030e460] false false [] 2335}
2021/10/22 10:38:26.384824 lvl=INF component=kafka records batch &{5807150 49 2 snappy 0 false false 0 2021-09-30 14:38:49.18 +0200 CEST 2021-09-30 14:38:49.18 +0200 CEST 82609 0 2745 [0xc00030e4d0] false false [] 2917}
2021/10/22 10:38:26.384858 lvl=INF component=kafka records batch &{5807374 49 2 snappy 0 false false 0 2021-09-30 15:12:07.587 +0200 CEST 2021-09-30 15:12:07.587 +0200 CEST 82609 0 2969 [0xc00030e540] false false [] 5744}
2021/10/22 10:38:26.384933 lvl=INF component=kafka records batch &{5807388 49 2 snappy 0 false false 0 2021-09-30 15:14:10.358 +0200 CEST 2021-09-30 15:14:10.358 +0200 CEST 82609 0 2983 [0xc00030e5b0] false false [] 31539}
2021/10/22 10:38:26.384958 lvl=INF component=kafka records batch &{5807391 49 2 snappy 0 false false 0 2021-09-30 15:14:44.664 +0200 CEST 2021-09-30 15:14:44.664 +0200 CEST 82609 0 2986 [0xc00030e620] false false [] 2135}
2021/10/22 10:38:26.385027 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.418552 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.450115 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.481958 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.522805 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.557499 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.593690 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.625917 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.657295 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.775601 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.806612 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.840360 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.871158 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}

Problem Description

I'm using Goka (Sarama based library) and have some problems with reading from the one partition of 4. The consumer has stuck on the one particular batch and rereads it again and again.

Here is some info: This is batch struct:

type RecordBatch struct {
	FirstOffset           int64
	PartitionLeaderEpoch  int32
	Version               int8
	Codec                 CompressionCodec
	CompressionLevel      int
	Control               bool
	LogAppendTime         bool
	LastOffsetDelta       int32
	FirstTimestamp        time.Time
	MaxTimestamp          time.Time
	ProducerID            int64
	ProducerEpoch         int16
	FirstSequence         int32
	Records               []*Record
	PartialTrailingRecord bool
	IsTransactional       bool

	compressedRecords []byte
	recordsLen        int // uncompressed records size
}

I added some logs for all decoded batches and firstly I see this:

2021/10/22 10:38:26.384671 lvl=INF component=kafka records batch &{5805856 49 2 snappy 0 false false 0 2021-09-30 11:29:46.662 +0200 CEST 2021-09-30 11:29:46.662 +0200 CEST 82609 0 1451 [0xc00030e0e0] false false [] 1735}
2021/10/22 10:38:26.384689 lvl=INF component=kafka records batch &{5805968 49 2 snappy 0 false false 0 2021-09-30 11:45:34.617 +0200 CEST 2021-09-30 11:45:34.617 +0200 CEST 82609 0 1563 [0xc00030e150] false false [] 1326}
2021/10/22 10:38:26.384705 lvl=INF component=kafka records batch &{5806083 49 2 snappy 0 false false 0 2021-09-30 12:00:01.188 +0200 CEST 2021-09-30 12:00:01.188 +0200 CEST 82609 0 1678 [0xc00030e1c0] false false [] 733}
2021/10/22 10:38:26.384728 lvl=INF component=kafka records batch &{5806313 49 2 snappy 0 false false 0 2021-09-30 12:29:55.542 +0200 CEST 2021-09-30 12:29:55.542 +0200 CEST 82609 0 1908 [0xc00030e230] false false [] 10546}
2021/10/22 10:38:26.384751 lvl=INF component=kafka records batch &{5806399 49 2 snappy 0 false false 0 2021-09-30 12:42:07.755 +0200 CEST 2021-09-30 12:42:07.755 +0200 CEST 82609 0 1994 [0xc00030e2a0] false false [] 1130}
2021/10/22 10:38:26.384772 lvl=INF component=kafka records batch &{5806689 49 2 snappy 0 false false 0 2021-09-30 13:25:25.119 +0200 CEST 2021-09-30 13:25:25.119 +0200 CEST 82609 0 2284 [0xc00030e380] false false [] 2934}
2021/10/22 10:38:26.384790 lvl=INF component=kafka records batch &{5807057 49 2 snappy 0 false false 0 2021-09-30 14:24:12.229 +0200 CEST 2021-09-30 14:24:12.229 +0200 CEST 82609 0 2652 [0xc00030e3f0] false false [] 3124}
2021/10/22 10:38:26.384807 lvl=INF component=kafka records batch &{5807106 49 2 snappy 0 false false 0 2021-09-30 14:31:33.337 +0200 CEST 2021-09-30 14:31:33.337 +0200 CEST 82609 0 2701 [0xc00030e460] false false [] 2335}
2021/10/22 10:38:26.384824 lvl=INF component=kafka records batch &{5807150 49 2 snappy 0 false false 0 2021-09-30 14:38:49.18 +0200 CEST 2021-09-30 14:38:49.18 +0200 CEST 82609 0 2745 [0xc00030e4d0] false false [] 2917}
2021/10/22 10:38:26.384858 lvl=INF component=kafka records batch &{5807374 49 2 snappy 0 false false 0 2021-09-30 15:12:07.587 +0200 CEST 2021-09-30 15:12:07.587 +0200 CEST 82609 0 2969 [0xc00030e540] false false [] 5744}
2021/10/22 10:38:26.384933 lvl=INF component=kafka records batch &{5807388 49 2 snappy 0 false false 0 2021-09-30 15:14:10.358 +0200 CEST 2021-09-30 15:14:10.358 +0200 CEST 82609 0 2983 [0xc00030e5b0] false false [] 31539}
2021/10/22 10:38:26.384958 lvl=INF component=kafka records batch &{5807391 49 2 snappy 0 false false 0 2021-09-30 15:14:44.664 +0200 CEST 2021-09-30 15:14:44.664 +0200 CEST 82609 0 2986 [0xc00030e620] false false [] 2135}

Looks ok. But then I see:

2021/10/22 10:38:26.385027 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.418552 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.450115 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.481958 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.522805 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.557499 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.593690 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.625917 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.657295 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.775601 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.806612 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.840360 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.871158 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.901792 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}
2021/10/22 10:38:26.934249 lvl=INF component=kafka records batch &{5807565 49 2 none 0 false false 0 0001-01-01 00:00:00 +0000 UTC 2021-09-30 15:43:16.281 +0200 CEST 82609 0 3160 [] false false [] 0}

and it keeps going indefinitely. I dug into the code and figured out that there is no exit from such situation when the records len is 0. Sarama just reads this batch again and again. A high watermark was not reached. We use AWS Managed Kafka in this case. Please help. We are firing.

pachmu avatar Oct 22 '21 10:10 pachmu

Hello! I just hit the same error as you, but your solution doesn't seem to work as the app is stuck in a loop and the offset isn't changing... Any idea how to fix this? :)

2021-12-31T10:32:39.030 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.031 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.031 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.032 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.033 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.034 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.035 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.036 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.037 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.038 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.039 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.040 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.041 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.042 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.042 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.043 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.044 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.045 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.046 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.046 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.047 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.048 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.049 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.050 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.051 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.051 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.052 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.053 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.054 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.055 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.055 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.056 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.057 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.058 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.059 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.060 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.061 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.062 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.062 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.063 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.064 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.065 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131
2021-12-31T10:32:39.066 consumer/broker/1004 received batch with zero records but high watermark was not reached, topic ***, partition 34, offset 15138131

grongor avatar Dec 31 '21 11:12 grongor

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

github-actions[bot] avatar Aug 19 '23 14:08 github-actions[bot]

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

github-actions[bot] avatar Nov 17 '23 20:11 github-actions[bot]

I experienced the same issue with Sarama 1.42.1. Apparently we had some issues with one of our Kafka nodes in the clusters that triggered this behaviour in our Sarama consumers. We have Kafka consumers in both Scala and Go, and only the Sarama client got into this endless loop, the Scala client handled the issue just fine. We are not sure what the issue with Kafka itself was yet, but apparently there is an edge case here with Sarama that make it enter into an endless loop with the error above and stops consuming messages.

kgatv2 avatar Nov 24 '23 07:11 kgatv2

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

github-actions[bot] avatar Apr 05 '24 02:04 github-actions[bot]