fabric icon indicating copy to clipboard operation
fabric copied to clipboard

HashMap in WordCount

Open HarvinderBhullar opened this issue 8 years ago • 3 comments

Hi Idris/Sathish

How your wordCount HashMap in the sample code is going to be behave in clustered environment?

Br Harvinder

HarvinderBhullar avatar Nov 14 '16 09:11 HarvinderBhullar

Hi Harvinder, Word count example in the samples is just a toy example. For running in clustered mode, you need to use a partitioned source. For example, publish sentences to Kafka topic partitioned based on sentence and use KafkaSource instead of RandomSentenceSource in the computation. All the components (source and processors) run within a single JVM only, so the entire computation is scaled horizontally by spawning one more process. The KafkaSource is intelligent enough to balance the partitions between multiple instances.

godofwharf avatar Nov 15 '16 10:11 godofwharf

My question was about the target HashMap..Anyways, what you are saying that this hashmap is going to be a KafkaSink in a distributed environment

HarvinderBhullar avatar Nov 15 '16 10:11 HarvinderBhullar

Sure, you can rewrite the WordCounter processor to use a distributed cache like Redis or Hazelcast to maintain counts in cluster mode.

godofwharf avatar Nov 15 '16 10:11 godofwharf