kafka-protocol-rs icon indicating copy to clipboard operation
kafka-protocol-rs copied to clipboard

decompressing Snappy encoded RecordBatch fails

Open pdeva opened this issue 1 year ago • 2 comments

getting error:

 snappy: corrupt input (expected valid offset but got offset 20545; dst position: 0)

changing to another compression like lz4 works fine.

pdeva avatar Sep 14 '24 17:09 pdeva

seems the bug occurs when the batch is compressed with snappy 'framing' turned on.

this is the code from kafka-go client. when x.framed is set to true, it will cause the decoder in kafka-protocol-rs to crash. seems its set to true by default for all popular kafka clients including the cannonical java client.

func (c *Codec) NewWriter(w io.Writer) io.WriteCloser {
	x, _ := writerPool.Get().(*xerialWriter)
	if x != nil {
		x.Reset(w)
	} else {
		x = &xerialWriter{writer: w}
	}
	x.framed = c.Framing == Framed
	switch c.Compression {
	case FasterCompression:
		x.encode = s2.EncodeSnappy
	case BetterCompression:
		x.encode = s2.EncodeSnappyBetter
	case BestCompression:
		x.encode = s2.EncodeSnappyBest
	default:
		x.encode = snappy.Encode // aka. s2.EncodeSnappyBetter
	}
	return &writer{xerialWriter: x}
}

pdeva avatar Sep 14 '24 18:09 pdeva

For compression options, we want to match the official Java client wherever possible, so this would certainly be a bug on our end.

tychedelia avatar Sep 14 '24 19:09 tychedelia

For the record, this is due to the non-standard framed encoding used before the snappy encoding was standardized. It seems the snap crate doesn't support this.

michaelbeaumont avatar Oct 30 '25 13:10 michaelbeaumont