kafka-go icon indicating copy to clipboard operation
kafka-go copied to clipboard

Add kafka-go/records package

Open achille-roussel opened this issue 3 years ago • 0 comments

kafka-go/records

I'm opening this PR to submit a suggestion to add a new package for higher-level components built on top of the kafka-go client APIs. In a similar way to what we did with the topics package, I would like to introduce a new records package in kafka-go.

There are two main APIs that I have implemented in this PR:

  • records.Storage which is an interface that abstracts the storage of opaque values indexed by broker address, topic, partition, and base offset. This interface comes with two implementations, on in memory backed by a Go map, and another using the local file system to record and index the values.
  • records.Cache which implements a caching logic for responses to fetch requests.

I made changes to the way record batches are decoded to delay decompression of the records in order to allow copying the compressed data directly though the cache to the storage layer without requiring re-encoding.

The README has more extensive documentation on the purposes of those new building blocks.

Let me know if you have any questions or feedback!

Breaking Changes

As part of working on this change, I would like to introduce a breaking change to the kafka.RecordReader interface, let me give some context first.

Currently, the record reader interface has a single method ReadRecord. When reading records, the keys and values are abstracted by kafka.Bytes interfaces which is an extension to io.ReadCloser. This model requires applications to close keys and values (after checking that they weren't nil) for every record that they read. In practice, this resource management is unnecessary due to the way we implement the readers underneath, and results in very verbose programs.

In this PR, I have changed the kafka.RecordReader to implement io.Closer, and changes kafka.Bytes to not require closing anymore. Applications now only need to close the record reader received in fetch responses rather than closing each key and value of each record. This model also more closely resembles more closely the design of http.Response in the standard library, where programs need to close the body after consuming it.

Since this is a breaking change, we would likely release this change under a new v0.5 release line as it has been the model we used in the past. While being pre-v1 gives us flexibility to make those types of changes, we need to be cautious about those as they can be costly for users to adapt their applications. I will write a section in the README to document how to migrate from v0.4 to v0.5.

achille-roussel avatar Feb 23 '22 22:02 achille-roussel