streamparse
streamparse copied to clipboard
Build a pure-Python Kafka Spout.
@kbourgoin to define this.
+1
Working through this with @amontalenti at the moment. Having a "Kafka Spout" base class as part of streamparse would be very nice. Ideally the base class should be smart enough to handle the exceptions coming from PyKafka. Willing to help test this or help develop this with you guys.
It should also be smart enough to pick up where it left off after topology shut downs, like the JVM-based one does. That's the really hard part.
From @amontalenti:
Here’s a gist of a first stab at a cleaned up Kafka spout based on our internal Kafka spouts using pykafka balanced consumer.
https://gist.github.com/amontalenti/4f459e5e702cc77bd7ab
It’s a base class that you can override. I cleaned up the docs a little so you can see what you’re supposed to implement on your side. Typically. you implement message validation in
is_okand you implement transformations inget_data. Your subclass can override these two methods and have a spout implemented that is doing balanced consuming and offset management.Here’s the issue — though this spout does implement
next_tuple()and scales up, it doesn’t implementack()andfail()logic. That’s on you, still. But you can make the Spout reliable by implementing these.Hope that helps as a starting point! We’re obviously working to make this an official part of the pykafka / pystorm / streamparse experience over time.