spark-LDA-example
spark-LDA-example copied to clipboard
A simple Spark LDA example. to demonstrate a full fletched clustering algorithm, with data cleaning using the processess like lemmatization , stemming etc.
spark-LDA-example
A simple Spark LDA example. This project contains a basic Document Clustering example in which data cleaning is also done.
We are going to perform these procedures for the document clustering, these steps include:
-
Spark RegexTokenizer : For Tokenization
-
Stanford NLP Morphology : For Stemming and lemmatization
-
Spark StopWordsRemover : For removing stop words and punctuation
-
Spark TF-IDF : For computing term frequencies or tf-idf
-
Spark LDA : For Clustering of documents.