streamparse Build a pure-Python Kafka Spout.

trafficstars

pykafka
#83

Sep 01 '15 05:09 rduplain

@kbourgoin to define this.

Oct 09 '15 15:10 rduplain

+1

Working through this with @amontalenti at the moment. Having a "Kafka Spout" base class as part of streamparse would be very nice. Ideally the base class should be smart enough to handle the exceptions coming from PyKafka. Willing to help test this or help develop this with you guys.

Oct 23 '15 14:10 jasonrhaas

It should also be smart enough to pick up where it left off after topology shut downs, like the JVM-based one does. That's the really hard part.

Oct 23 '15 15:10 dan-blanchard

From @amontalenti:

Here’s a gist of a first stab at a cleaned up Kafka spout based on our internal Kafka spouts using pykafka balanced consumer.

https://gist.github.com/amontalenti/4f459e5e702cc77bd7ab

It’s a base class that you can override. I cleaned up the docs a little so you can see what you’re supposed to implement on your side. Typically. you implement message validation in is_ok and you implement transformations in get_data. Your subclass can override these two methods and have a spout implemented that is doing balanced consuming and offset management.

Here’s the issue — though this spout does implement next_tuple() and scales up, it doesn’t implement ack() and fail() logic. That’s on you, still. But you can make the Spout reliable by implementing these.

Hope that helps as a starting point! We’re obviously working to make this an official part of the pykafka / pystorm / streamparse experience over time.

Oct 23 '15 15:10 rduplain

streamparse streamparse copied to clipboard

Build a pure-Python Kafka Spout.

streamparse
streamparse copied to clipboard