spark-LDA-example icon indicating copy to clipboard operation
spark-LDA-example copied to clipboard

A simple Spark LDA example. to demonstrate a full fletched clustering algorithm, with data cleaning using the processess like lemmatization , stemming etc.

spark-LDA-example

A simple Spark LDA example. This project contains a basic Document Clustering example in which data cleaning is also done.

We are going to perform these procedures for the document clustering, these steps include:

  1. Spark RegexTokenizer : For Tokenization

  2. Stanford NLP Morphology : For Stemming and lemmatization

  3. Spark StopWordsRemover : For removing stop words and punctuation

  4. Spark TF-IDF : For computing term frequencies or tf-idf

  5. Spark LDA : For Clustering of documents.