kafka-go
kafka-go copied to clipboard
check remaining bytes when marking batch as empty
Fixes issue-1188 and drastically reduce the number of reader errors and dials. Across a couple hundred reader instances, we saw the number of reader errors drop from a couple hundred per minute to 0.
When hwm==offset and remaining bytes in the buffer is not zero, it means the buffer contains information that has not been replicated yet, as least according to the hwm. However from our testing, this is mostly entirely due to race condition (i.e. we asked for hwm too soon, before the message is replicated.), and hwm always catches up so there doesn't seem to be any harm reading those bytes in the buffer.
Alternatively, we could discard the remaining bytes in the buffer when we mark the batch empty based on hwm==offset, similar to this PR.
@zachxu42 this seems like a plausible fix but given that we've broken kafka-go before trying to fix this issue I would feel a lot better if we first found some way to:
- Reproduce this issue in a test
- Add test coverage to prevent us from breaking this in a similar way as we did in https://github.com/segmentio/kafka-go/pull/788
We spent some more time comparing this to https://github.com/segmentio/kafka-go/pull/1177 and we think that is more of a root cause fix, but I think we'll still need to make some testing investments prior to proceeding with it
Thanks @petedannemann for the comment. I believe #1177 is a different case, when the error is not nil. In this PR I could change it to do the same
if highWaterMark == offset {
msgs = &messageSetReader{empty: true}
if remain > 0 {
c.rbuf.Discard(remain)
}
} else {
msgs, err = newMessageSetReader(&c.rbuf, remain)
}
I think the issue is, whether there's an error or not, if we're about to return an empty message set, we need to make sure we clear the buffer if there's any remaining bytes, so the next reader doesn't see partial/corrupt data.