big-data-scala-spark
big-data-scala-spark copied to clipboard
def occurrencesOfLang(lang: String, rdd: RDD[WikipediaArticle]): Int = ???
Hello mate,
according to the hints:
/** Returns the number of articles on which the language lang occurs.
- Hint1: consider using method
aggregateon RDD[T]. - Hint2: should you count the "Java" language when you see "JavaScript"?
- Hint3: the only whitespaces are blanks " "
- Hint4: no need to search in the title :) I think we should use rdd.aggregate method to compute 'lang's all occurrences that includes repeated one in the single element of RDD. Here is my solution: rdd.aggregate(0)(_ + _.text.split(" ").count( _ ==lang), _ + _) Maybe I am wrong, so I would like to hear your advice Thank you, I appreciate it