clj-kafka icon indicating copy to clipboard operation
clj-kafka copied to clipboard

Offset management

Open cddr opened this issue 9 years ago • 6 comments

Hey folks,

What are your thoughts about the new method of managing offsets in kafka. There's some documentation (in the form of example code) here...

https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka

The TLDR is that there's a quite a bit of overhead to maintaining the offset in zookeeper so there's another approach which involves writing to a topic, and keeping an in-memory cache of the current offset so that consumers with high throughput, or lots of consumers groups (or both) can still commit after processing each message rather than trying to limit the frequency of commits. Would you like clj-kafka to provide something like this?

cddr avatar Sep 15 '15 19:09 cddr

It's definitely interesting, although I'd probably lean to this being an add-on lib that people could pull in, I guess as a kind of offset strategy.

Having said that, I'm not overly familiar with the development but I think upcoming releases of Kafka will have a broker API suitable for centrally managing offsets: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommit/FetchAPI .

Again, I'd probably err on the side of, as far as possible, letting people choose whichever offset strategy they like.

What do you think? Would you be up for developing a clj-kafka equivalent to the confluence code you posted?

On Tue, Sep 15, 2015 at 8:51 PM, Andy Chambers [email protected] wrote:

Hey folks,

What are your thoughts about the new method of managing offsets in kafka. There's some documentation (in the form of example code) here...

https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka

The TLDR is that there's a quite a bit of overhead to maintaining the offset in zookeeper so there's another approach which involves writing to a topic, and keeping an in-memory cache of the current offset so that consumers with high throughput, or lots of consumers groups (or both) can still commit after processing each message rather than trying to limit the frequency of commits. Would you like clj-kafka to provide something like this?

— Reply to this email directly or view it on GitHub https://github.com/pingles/clj-kafka/issues/65.

pingles avatar Sep 15 '15 20:09 pingles

D'oh. I've just realised your suggestion uses the API I found :)

Haha. Yep, definitely up for adding support. I'll see if I can get some time this week to have a look, of course pull requests are always still welcome!!

pingles avatar Sep 15 '15 21:09 pingles

Cool!

I think we will need this either way so if you don't get to it, we'll get to it soon enough. Just wanted to check before digging in. Thanks for this library. It's been working great for us so far.

cddr avatar Sep 15 '15 21:09 cddr

Hey @pingles. Just letting you know, I probably wont get to this any time soon as my company appears to be leaning towards using samza which handles this stuff itself.

cddr avatar Oct 09 '15 17:10 cddr

This looks like it was done in open PR #64

ottbot avatar Oct 20 '15 21:10 ottbot

Thanks for the reminder- we'll try and take a look this week for merging it in. Apologies for the delay, been busy with some other unrelated stuff at work.

pingles avatar Oct 21 '15 07:10 pingles