awesome-topic-models icon indicating copy to clipboard operation
awesome-topic-models copied to clipboard

✨ Awesome - A curated list of amazing Topic Models (implementations, libraries, and resources)

Awesome Topic Models Awesome

A curated list of amazing topic modelling libraries.

Contents

  • Libraries & Toolkits
  • Models
  • Techniques
  • Research Implementations
  • Visualizations
  • Resources
  • Related awesome lists

Libraries & Toolkits

  • gensim - Python library for topic modelling GitHub Repo stars
  • scikit-learn - Python library for machine learning GitHub Repo stars
  • tomotopy - Python extension for Gibbs sampling based tomoto which is written in C++ GitHub Repo stars
  • tomoto - Ruby extension for Gibbs sampling based tomoto which is written in C++ GitHub Repo stars
  • OCTIS - Python package to integrate, optimize and evaluate topic models GitHub Repo stars
  • tmtoolkit - Python topic modeling toolkit with parallel processing power GitHub Repo stars
  • Mallet - Java-based package for topic modeling GitHub Repo stars
  • TopicModel4J - Java-based package for topic modeling GitHub Repo stars
  • BIDMach - CPU and GPU-accelerated machine learning library GitHub Repo stars
  • BigARTM - Fast topic modeling platform GitHub Repo stars
  • TopicNet - A high-level Python interface for BigARTM library GitHub Repo stars
  • stm - R package for the Structural Topic Model GitHub Repo stars
  • RMallet - R package to interface with the Java machine learning tool MALLET GitHub Repo stars
  • R-lda - R package for topic modelling (LDA, sLDA, corrLDA, etc.) GitHub Repo stars
  • topicmodels - R package with interface to C code for LDA and CTM GitHub Repo stars
  • lda++ - C++ library for LDA and (fast) supervised LDA (sLDA/fsLDA) using variational inference GitHub Repo stars

Models

There are huge differences in performance and scalability as well as the support of advanced features as hyperparameter tuning or evaluation capabilities.

Truncated Singular Value Decomposition (SVD) / Latent Semantic Analysis (LSA) / Latent Semantic Indexing (LSI)

Non-Negative Matrix Factorization (NMF or NNMF)

Latent Dirichlet Allocation (LDA) :page_facing_up:

Hyperparameter optimization

Evaluation

CPU-based high performance implementations

GPU-based high performance implementations

  • SaberLDA - GPU-based system that implements a sparsity-aware algorithm to achieve sublinear time complexity
  • GS-LDA-BIDMach - CPU and GPU-accelerated Scala implementation using Gibbs sampling
  • VB-LDA-BIDMach - CPU and GPU-accelerated Scala implementation using online variational Bayes inference

Hierarchical Dirichlet Process (HDP) :page_facing_up:

  • gensim - Python implementation using online variational inference :page_facing_up:
  • tomotopy - Python extension for C++ implementation using Gibbs sampling :page_facing_up:
  • Mallet - Java-based package for topic modeling using Gibbs sampling
  • TopicModel4J - Java implementation using Gibbs sampling based on Chinese restaurant franchise metaphor
  • hca - C implementation using Gibbs sampling with/without burstiness modelling
  • bnp - Cython reimplementation based on online-hdp following scikit-learn's API.
  • Scalable HDP - interesting paper

Hierarchical LDA (hLDA) :page_facing_up:

  • tomotopy - Python extension for C++ implementation using Gibbs sampling
  • Mallet - Java implementation using Gibbs sampling
  • hlda - Python package based on Mallet's Gibbs sampler having a fixed depth on the nCRP tree
  • hLDA - C implementation of hierarchical LDA by David Blei

Dynamic Topic Model (DTM) :page_facing_up:

Author-topic Model (ATM) :page_facing_up:

Labeled Latent Dirichlet Allocation (LLDA, Labeled-LDA, L-LDA) :page_facing_up:

Partially Labeled Dirichlet Allocation (PLDA) / Dirichlet Process (PLDP) :page_facing_up:

  • tomotopy - Python extension for C++ implementation using Gibbs sampling
  • TopicModel4J - Java implementation using collapsed Gibbs sampling
  • STMT - Scala implementation of PLDA & PLDP by Daniel Ramage

Dirichlet Multinomial Regression (DMR) topic model :page_facing_up:

  • tomotopy - Python extension for C++ implementation using Gibbs sampling
  • Mallet - Java-based package for topic modeling

Generalized Dirichlet Multinomial Regression (g-DMR) topic model :page_facing_up:

  • tomotopy - Python extension for C++ implementation using Gibbs sampling

Link LDA

Correlated Topic Model (CTM) a.k.a. logistic-normal topic models

Relational Topic Model (RTM)

Supervised LDA (sLDA) :page_facing_up:

  • tomotopy - Python extension for C++ implementation using Gibbs sampling
  • R-lda - R implementation using collapsed Gibbs sampling
  • slda - Cython implementation of Gibbs sampling for LDA and various sLDA variants
    • supervised LDA (linear regression)
    • binary logistic supervised LDA (logistic regression)
    • binary logistic hierarchical supervised LDA (trees)
    • generalized relational topic models (graphs)
  • YWWTools - Java implementation using Gibbs sampling for LDA and various sLDA variants:
    • BS-LDA: Binary SLDA
    • Lex-WSB-BS-LDA: BS-LDA with Lexcial Weights and Weighted Stochastic Block Priors
    • Lex-WSB-Med-LDA: Lex-WSB-BS-LDA with Hinge Loss
  • sLDA - C++ implementation of supervised topic models with a categorical response

Topic Models for short documents

Sentence-LDA / SentenceLDA / Sentence LDA :page_facing_up:

Dirichlet Multinomial Mixture Model (DMM) :page_facing_up:

Dirichlet Process Multinomial Mixture Model (DPMM)

Pseudo-document-based Topic Model (PTM) :page_facing_up:

  • tomotopy - Python extension for C++ implementation using Gibbs sampling
  • TopicModel4J - Java implementation using collapsed Gibbs sampling

Biterm topic model (BTM)

  • TopicModel4J - Java implementation using collapsed Gibbs sampling
  • BTM - Original C++ implementation using collapsed Gibbs sampling :page_facing_up:
  • BurstyBTM - Original C++ implementation of the Bursty BTM (BBTM) :page_facing_up:
  • OnlineBTM - Original C++ implementation of online BTM (oBTM) and incremental BTM (iBTM) :page_facing_up
  • R-BTM - R package wrapping the C++ code from BTM

Others

  • STTM - Java implementation and evaluation of DMM, WNTM, PTM, ETM, GPU-DMM, GPU-DPMM, LF-DMM :page_facing_up:
  • SATM - Java implementation of Self-Aggregation Topic Model :page_facing_up:
  • shorttext - Python implementation of various algorithms for Short Text Mining

Miscellaneous topic models

Exotic models

Embedding based Topic Models

Probabilistic Programming Languages (PPL) (a.k.a. Build your own Topic Model)

Research Implementations

  • lda-c - C implementation using variational EM by David Blei
  • sLDA - C++ implementation of supervised topic models with a categorical response.
  • onlineldavb - Python online variational Bayes implementation by Matthew Hoffman :page_facing_up:
  • HDP - C++ implementation of hierarchical Dirichlet processes by Chong Wang
  • online-hdp - Python implementation of online hierarchical Dirichlet processes by Chong Wang
  • ctr - C++ implementation of collaborative topic models by Chong Wang
  • dtm - C implementation of dynamic topic models by David Blei & Sean Gerrish
  • ctm-c - C implementation of the correlated topic model by David Blei
  • diln - C implementation of Discrete Infinite Logistic Normal (with HDP option) by John Paisley
  • hLDA - C implementation of hierarchical LDA by David Blei
  • turbotopics - Python implementation that finds significant multiword phrases in topics by David Blei
  • Stanford Topic Modeling Toolbox - Scala implementation of LDA, labeledLDA, PLDA, PLDP by Daniel Ramage and Evan Rosen
  • LDAGibbs - Java implementation of LDA using Gibbs sampling by Liu Yang
  • Matlab Topic Modeling Toolbox - Matlab implementations of LDA, ATM, HMM-LDA, LDA-COL (Collocation) models by Mark Steyvers and Tom Griffiths
  • cvbLDA - Python C extension implementation of collapsed variational Bayesian inference for LDA
  • fast - A Fast And Scalable Topic-Modeling Toolbox (Fast-LDA, CVB0) by Arthur Asuncion and colleagues :page_facing_up:

Popular Implementations (but not maintained anymore)

Learning Implementations (hopefully easy to understand)

  • topic_models - Python implementation of LSA, PLSA and LDA
  • Topic-Model - Python implementation of LDA, Labeled LDA, ATM, Temporal Author-Topic Model using Gibbs sampling

Visualizations

  • LDAvis - R package for interactive topic model visualization
  • pyLDAvis - Python library for interactive topic model visualization
  • scalaLDAvis - Scala port of pyLDAvis
  • dtmvisual - Python package for visualizing DTM (trained with gensim)
  • TMVE online - Online Django variant of topic model visualization engine (TMVE)
  • TMVE - Original topic model visualization engine (LDA trained with lda-c) :page_facing_up:
  • topicmodel-lib - Python wrapper for TMVE for visualizing LDA (trained with topicmodel-lib)
  • wordcloud - Python package for visualizing topics via word_cloud
  • Mallet-GUI - GUI for creating and analyzing topic models produced by MALLET
  • TWiC - Topic Words in Context is a highly-interactive, browser-based visualization for MALLET topic models
  • dfr-browser - Explore Mallet's topic models of texts in a web browser
  • Termite - Explore topic models using term-topic matrix, group-in-a-box visualization or scatter plot.
  • Topics - Python library for topic modeling and visualization
  • TopicsExplorer - Explore your own text collection with a topic model – without prior knowledge :page_facing_up:
  • topicApp - A Simple Shiny App for Topic Modeling
  • stminsights - A Shiny Application for Inspecting Structural Topic Models

Dirichlet hyperparameter optimization techniques

Resources

  • David Blei - David Blei's Homepage with introductory materials

Related awesome lists

Contribute

Contributions welcome! Read the contribution guidelines first.

License

CC0

To the extent possible under law, Jonathan Schneider has waived all copyright and related or neighboring rights to this work.