summingbird-hybrid-example icon indicating copy to clipboard operation
summingbird-hybrid-example copied to clipboard

data never gets ingested

Open obh opened this issue 10 years ago • 16 comments

Thanks for this awesome example. I think I'm stuck on something really stupid. Basically, I got things working except that on starting the program Zookeeper throws this error "no brokers found when trying to rebalance." After that I can see the events being produced and put into the kafka queue, but nothing gets ingested. (Events Ingested is always 0)

obh avatar Apr 16 '14 11:04 obh

I am facing the same issue. Did this issue get fixed at all ? If there is any alternative to get out of this, can someone please share their experience ?

kandu009 avatar Nov 04 '14 21:11 kandu009

Are you using Kafka 7 or 8? I've seen this issue before when using the wrong Kafka consumer version. I.e 7 instead of 8 or vice versa.

Try using this PR from Tormenta https://github.com/twitter/tormenta/pull/52 if you are using Kafka 8.

If you are using Kafka 7 then you will probably have to adapt this example to use the original Kafka-Tormenta API instead of https://github.com/kscaldef/summingbird-hybrid-example/blob/master/src/main/scala/com/twitter/tormenta/spout/KafkaSpout.scala

ghost avatar Nov 10 '14 19:11 ghost

Thanks for the reply.

I have tried using kafka 8 and followed the instructions. But it doesn't help. Any clear instructions on what all changes needs to be done here ?

I have been looking at your second option of using Kafka 7. If possible, could you share your modified hybrid example for Kafka 7?

kandu009 avatar Nov 18 '14 14:11 kandu009

@upio I have added more details regarding issues with Kafka 8 + SummingBird here https://github.com/kscaldef/summingbird-hybrid-example/issues/2

kandu009 avatar Nov 18 '14 21:11 kandu009

I've put together a modified example using Docker and my patched Tormenta for you here https://github.com/upio/summingbird-hybrid-example

There are instructions in there but you'll need docker, fig and https://github.com/upio/tormenta in your local maven repository.

See if this works for you.

ghost avatar Nov 18 '14 21:11 ghost

Hi,

Thanks for sharing the details. I am able to run this but I see a couple of exceptions, errors, warnings here.

1. 14/11/18 23:20:01 WARN producer.BrokerPartitionInfo: Error while fetching metadata [{TopicMetadata for topic summingbird.proto.productview ->
No partition metadata for topic summingbird.proto.productview due to kafka.common.LeaderNotAvailableException}] for topic [summingbird.proto.productview]: class kafka.common.LeaderNotAvailableException

2. 14/11/18 23:20:01 ERROR async.DefaultEventHandler: Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: summingbird.proto.productview

3. 14/11/18 23:05:00 WARN scalding.Scalding: Store: List() has no commutativity setting. Assuming MonoidIsCommutative(NonCommutative)
14/11/18 23:05:00 INFO scalding.Scalding: Store: List() is non-commutative (less efficient than commutative)

4. 14/11/18 23:05:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/18 23:05:05 WARN snappy.LoadSnappy: Snappy native library not loaded

5. Though it sees data in these partitions and counts them periodically, I see these warnings in every loop.

14/11/18 23:02:59 WARN kafka.KafkaUtils: No data found in Kafka Partition partition_1
14/11/18 23:03:03 WARN kafka.KafkaUtils: No data found in Kafka Partition partition_0

kandu009 avatar Nov 19 '14 05:11 kandu009

Did you change your IP address in the fig.yml file? I think this is what causes the LeaderNotAvailable issues. Make sure the advertised hosts in fig.yml is your IP.

Also, make sure clean up what you've done with docker already:

  1. Change the IP
  2. fig stop
  3. fig rm
  4. fig up -d
  5. rm -rf /tmp/summingbird-proto/

See if this fixes it.

ghost avatar Nov 19 '14 05:11 ghost

Yeah I did change the IP as mentioned in the README. After doing what you have suggested I still see all of the above mentioned errors, warnings.

kandu009 avatar Nov 19 '14 05:11 kandu009

I see those warnings too. Not sure if they are important, but it works so I don't think so. It's something to do with your Kafka set up. Can you show me output of:

ifconfig fig ps cat fig.yml

Have you tried using Kafka CLI tools and Zookeeper CLI tools to see if you can connect to Kafka and Zookeeper? I still think it's an issue with Kafka not being able to communicate with the Zookeeper Docker container. I've had this exact issue before and the problem is always the IP address in fig.yml. What operating system are you using? I haven't tested this with boot2docker on mac/windows.

ghost avatar Nov 19 '14 05:11 ghost

fig ps:

            Name                              Command               State                      Ports

summingbirdhybridexample_kafka_1 /bin/sh -c start-kafka.sh Up 0.0.0.0:49155->9092/tcp summingbirdhybridexample_memcached_1 memcached Up 0.0.0.0:49153->11211/tcp summingbirdhybridexample_zookeeper_1 /opt/zookeeper-3.4.5/bin/z ... Up 0.0.0.0:49154->2181/tcp, 2888/tcp, 3888/tcp

I have used the same IP address that's given under ifconfig eth0 inet in fig.yml I am using Windows.

I haven't tried using the CLI tools yet. Will try that out and see.

kandu009 avatar Nov 19 '14 06:11 kandu009

@upio Is there a way in which we can specify multiple hosts to run this entire setup? I mean, run storm on host1 and scalding on host2 and run the hybrid on one of these hosts host1 or host2 ? Thanks in advance.

kandu009 avatar Nov 22 '14 16:11 kandu009

Well there is no way to specify multiple hosts but you can just manually run the StormRunner and ScaldingRunner from different machines and then change the Memcached addresses for the Hybrid Store. Eventually all these jobs will do is launch jobs on a Storm/Hadoop cluster and load data into 2 separate serving layers like Memcached/Cassandra/HBase. An example of this set up would be awesome.

ghost avatar Dec 03 '14 19:12 ghost

https://github.com/upio/summingbird-hybrid-example works for me

dkwestbr avatar Dec 10 '14 22:12 dkwestbr

Using upio's's forked example, I get a lot of errors that look like: WARN state.ConnectionStateManager: There are no ConnectionStateListeners registered.

ERROR producer.SyncProducer: Producer connection to localhost:49155 unsuccessful java.net.ConnectException: Connection refused

I think I am using the correct IP, the one from docker0 in ifconfig. I've also tried a bunch of IPs (eth0 etc).

Any ideas?

jak3chase avatar Jul 16 '15 11:07 jak3chase

@jak3chase can you open an issue on the forked version and include fig ps and information about your environment? Linux, OSX or Windows for example? First things that comes to mind is boot2docker, port forwarding an binding to localhost instead of 0.0.0.0.

ghost avatar Jul 16 '15 19:07 ghost

@upio Thanks a lot for the reply! Unfortunately I wasn't able to open an issue on the forked repository after looking for a bit. Perhaps you haven't enabled Issues?

Anyways, I'm running OS X 10.10.13, and Java 7. My fig ps looks exactly like the one on the README, and I added the modified Tormenta to my maven repo.

fig ps:

summingbirdhybridexample_kafka_1 /bin/sh -c start-kafka.sh Up 0.0.0.0:49155->9092/tcp
summingbirdhybridexample_memcached_1 memcached Up 0.0.0.0:49153->11211/tcp
summingbirdhybridexample_zookeeper_1 /opt/zookeeper-3.4.5/bin/z ... Up 0.0.0.0:49154->2181/tcp, 2888/tcp, 3888/tcp

jak3chase avatar Jul 17 '15 06:07 jak3chase