big-data-scala-spark def occurrencesOfLang(lang: String, rdd: RDD[WikipediaArticle]): Int = ???

def occurrencesOfLang(lang: String, rdd: RDD[WikipediaArticle]): Int = ???

Open Jayyyyyyyyyyyyy opened this issue 8 years ago • 0 comments

Hello mate, according to the hints: /** Returns the number of articles on which the language lang occurs.

Hint1: consider using method aggregate on RDD[T].
Hint2: should you count the "Java" language when you see "JavaScript"?
Hint3: the only whitespaces are blanks " "
Hint4: no need to search in the title :) I think we should use rdd.aggregate method to compute 'lang's all occurrences that includes repeated one in the single element of RDD. Here is my solution: rdd.aggregate(0)(_ + _.text.split(" ").count( _ ==lang), _ + _) Maybe I am wrong, so I would like to hear your advice Thank you, I appreciate it

Apr 26 '17 16:04 Jayyyyyyyyyyyyy