streamparse icon indicating copy to clipboard operation
streamparse copied to clipboard

Build a pure-Python Kafka Spout.

Open rduplain opened this issue 10 years ago • 4 comments
trafficstars

Related:

rduplain avatar Sep 01 '15 05:09 rduplain

@kbourgoin to define this.

rduplain avatar Oct 09 '15 15:10 rduplain

+1

Working through this with @amontalenti at the moment. Having a "Kafka Spout" base class as part of streamparse would be very nice. Ideally the base class should be smart enough to handle the exceptions coming from PyKafka. Willing to help test this or help develop this with you guys.

jasonrhaas avatar Oct 23 '15 14:10 jasonrhaas

It should also be smart enough to pick up where it left off after topology shut downs, like the JVM-based one does. That's the really hard part.

dan-blanchard avatar Oct 23 '15 15:10 dan-blanchard

From @amontalenti:

Here’s a gist of a first stab at a cleaned up Kafka spout based on our internal Kafka spouts using pykafka balanced consumer.

https://gist.github.com/amontalenti/4f459e5e702cc77bd7ab

It’s a base class that you can override. I cleaned up the docs a little so you can see what you’re supposed to implement on your side. Typically. you implement message validation in is_ok and you implement transformations in get_data. Your subclass can override these two methods and have a spout implemented that is doing balanced consuming and offset management.

Here’s the issue — though this spout does implement next_tuple() and scales up, it doesn’t implement ack() and fail() logic. That’s on you, still. But you can make the Spout reliable by implementing these.

Hope that helps as a starting point! We’re obviously working to make this an official part of the pykafka / pystorm / streamparse experience over time.

rduplain avatar Oct 23 '15 15:10 rduplain