summingbird-hybrid-example
summingbird-hybrid-example copied to clipboard
data never gets ingested
Thanks for this awesome example. I think I'm stuck on something really stupid. Basically, I got things working except that on starting the program Zookeeper throws this error "no brokers found when trying to rebalance." After that I can see the events being produced and put into the kafka queue, but nothing gets ingested. (Events Ingested is always 0)
I am facing the same issue. Did this issue get fixed at all ? If there is any alternative to get out of this, can someone please share their experience ?
Are you using Kafka 7 or 8? I've seen this issue before when using the wrong Kafka consumer version. I.e 7 instead of 8 or vice versa.
Try using this PR from Tormenta https://github.com/twitter/tormenta/pull/52 if you are using Kafka 8.
If you are using Kafka 7 then you will probably have to adapt this example to use the original Kafka-Tormenta API instead of https://github.com/kscaldef/summingbird-hybrid-example/blob/master/src/main/scala/com/twitter/tormenta/spout/KafkaSpout.scala
Thanks for the reply.
I have tried using kafka 8 and followed the instructions. But it doesn't help. Any clear instructions on what all changes needs to be done here ?
I have been looking at your second option of using Kafka 7. If possible, could you share your modified hybrid example for Kafka 7?
@upio I have added more details regarding issues with Kafka 8 + SummingBird here https://github.com/kscaldef/summingbird-hybrid-example/issues/2
I've put together a modified example using Docker and my patched Tormenta for you here https://github.com/upio/summingbird-hybrid-example
There are instructions in there but you'll need docker, fig and https://github.com/upio/tormenta in your local maven repository.
See if this works for you.
Hi,
Thanks for sharing the details. I am able to run this but I see a couple of exceptions, errors, warnings here.
1. 14/11/18 23:20:01 WARN producer.BrokerPartitionInfo: Error while fetching metadata [{TopicMetadata for topic summingbird.proto.productview ->
No partition metadata for topic summingbird.proto.productview due to kafka.common.LeaderNotAvailableException}] for topic [summingbird.proto.productview]: class kafka.common.LeaderNotAvailableException
2. 14/11/18 23:20:01 ERROR async.DefaultEventHandler: Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: summingbird.proto.productview
3. 14/11/18 23:05:00 WARN scalding.Scalding: Store: List() has no commutativity setting. Assuming MonoidIsCommutative(NonCommutative)
14/11/18 23:05:00 INFO scalding.Scalding: Store: List() is non-commutative (less efficient than commutative)
4. 14/11/18 23:05:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/18 23:05:05 WARN snappy.LoadSnappy: Snappy native library not loaded
5. Though it sees data in these partitions and counts them periodically, I see these warnings in every loop.
14/11/18 23:02:59 WARN kafka.KafkaUtils: No data found in Kafka Partition partition_1
14/11/18 23:03:03 WARN kafka.KafkaUtils: No data found in Kafka Partition partition_0
Did you change your IP address in the fig.yml file? I think this is what causes the LeaderNotAvailable issues. Make sure the advertised hosts in fig.yml is your IP.
Also, make sure clean up what you've done with docker already:
- Change the IP
- fig stop
- fig rm
- fig up -d
- rm -rf /tmp/summingbird-proto/
See if this fixes it.
Yeah I did change the IP as mentioned in the README. After doing what you have suggested I still see all of the above mentioned errors, warnings.
I see those warnings too. Not sure if they are important, but it works so I don't think so. It's something to do with your Kafka set up. Can you show me output of:
ifconfig fig ps cat fig.yml
Have you tried using Kafka CLI tools and Zookeeper CLI tools to see if you can connect to Kafka and Zookeeper? I still think it's an issue with Kafka not being able to communicate with the Zookeeper Docker container. I've had this exact issue before and the problem is always the IP address in fig.yml. What operating system are you using? I haven't tested this with boot2docker on mac/windows.
fig ps:
Name Command State Ports
summingbirdhybridexample_kafka_1 /bin/sh -c start-kafka.sh Up 0.0.0.0:49155->9092/tcp summingbirdhybridexample_memcached_1 memcached Up 0.0.0.0:49153->11211/tcp summingbirdhybridexample_zookeeper_1 /opt/zookeeper-3.4.5/bin/z ... Up 0.0.0.0:49154->2181/tcp, 2888/tcp, 3888/tcp
I have used the same IP address that's given under ifconfig eth0 inet in fig.yml I am using Windows.
I haven't tried using the CLI tools yet. Will try that out and see.
@upio Is there a way in which we can specify multiple hosts to run this entire setup? I mean, run storm on host1 and scalding on host2 and run the hybrid on one of these hosts host1 or host2 ? Thanks in advance.
Well there is no way to specify multiple hosts but you can just manually run the StormRunner and ScaldingRunner from different machines and then change the Memcached addresses for the Hybrid Store. Eventually all these jobs will do is launch jobs on a Storm/Hadoop cluster and load data into 2 separate serving layers like Memcached/Cassandra/HBase. An example of this set up would be awesome.
https://github.com/upio/summingbird-hybrid-example works for me
Using upio's's forked example, I get a lot of errors that look like: WARN state.ConnectionStateManager: There are no ConnectionStateListeners registered.
ERROR producer.SyncProducer: Producer connection to localhost:49155 unsuccessful java.net.ConnectException: Connection refused
I think I am using the correct IP, the one from docker0 in ifconfig. I've also tried a bunch of IPs (eth0 etc).
Any ideas?
@jak3chase can you open an issue on the forked version and include fig ps
and information about your environment? Linux, OSX or Windows for example? First things that comes to mind is boot2docker, port forwarding an binding to localhost instead of 0.0.0.0.
@upio Thanks a lot for the reply! Unfortunately I wasn't able to open an issue on the forked repository after looking for a bit. Perhaps you haven't enabled Issues?
Anyways, I'm running OS X 10.10.13, and Java 7. My fig ps looks exactly like the one on the README, and I added the modified Tormenta to my maven repo.
fig ps:
summingbirdhybridexample_kafka_1 /bin/sh -c start-kafka.sh Up 0.0.0.0:49155->9092/tcp
summingbirdhybridexample_memcached_1 memcached Up 0.0.0.0:49153->11211/tcp
summingbirdhybridexample_zookeeper_1 /opt/zookeeper-3.4.5/bin/z ... Up 0.0.0.0:49154->2181/tcp, 2888/tcp, 3888/tcp