big-data-scala-spark icon indicating copy to clipboard operation
big-data-scala-spark copied to clipboard

def occurrencesOfLang(lang: String, rdd: RDD[WikipediaArticle]): Int = ???

Open Jayyyyyyyyyyyyy opened this issue 8 years ago • 0 comments

Hello mate, according to the hints: /** Returns the number of articles on which the language lang occurs.

  • Hint1: consider using method aggregate on RDD[T].
  • Hint2: should you count the "Java" language when you see "JavaScript"?
  • Hint3: the only whitespaces are blanks " "
  • Hint4: no need to search in the title :) I think we should use rdd.aggregate method to compute 'lang's all occurrences that includes repeated one in the single element of RDD. Here is my solution: rdd.aggregate(0)(_ + _.text.split(" ").count( _ ==lang), _ + _) Maybe I am wrong, so I would like to hear your advice Thank you, I appreciate it

Jayyyyyyyyyyyyy avatar Apr 26 '17 16:04 Jayyyyyyyyyyyyy