spark-LDA-example copied to clipboard
A simple Spark LDA example. to demonstrate a full fletched clustering algorithm, with data cleaning using the processess like lemmatization , stemming etc.
A simple Spark LDA example. This project contains a basic Document Clustering example in which data cleaning is also done.
We are going to perform these procedures for the document clustering, these steps include:
Spark RegexTokenizer : For Tokenization
Stanford NLP Morphology : For Stemming and lemmatization
Spark StopWordsRemover : For removing stop words and punctuation
Spark TF-IDF : For computing term frequencies or tf-idf
Spark LDA : For Clustering of documents.