topicModels
topicModels copied to clipboard
topics Models extension for Mallet & scikit-learn
Mallet Extension
In Mallet package, it only contains two topic Models--LDA and Hierachical LDA.
So I tried to implement some useful topic modeling methods on it.
Model:
- Hierarchical Dirichlet Process with Gibbs Sampling. (in
HDPfolder) - Inference part for hLDA. (in
hLDAfolder)
Usage:
- This is an extension for Mallet, so you need to have Mallet's source code first.
- put
HDP.java,HDPInferencer.javaandHierarchicalLDAInferencer.javainsrc/cc/mallet/topicsfolder. - If you are going to run HDP, make sure you include
knowceanspackage in your project. - run
HDPTest.javaorhLDATest.javawill give you a demo for a small dataset indatafolder.
References:
Scikit-learn Extension
Note:
This extension is merged in scikit-learn 0.17 version.
Model:
- online LDA with variational inference. (In
LDAfolder)
Usage:
- Make sure
numpy,scipy, andscikit-learnare installed. - run
python testinldafolder for unit test - The onlineLDA model is in
lda.py. - For a quick exmaple, run
python lda_example.py onlinewill fit a 10 topics model with 20 NewsGroup dataset.onlinemeans we use online update(orpartial_fitmethod). Changeonlinetobatchwill fit the model with batch update(orfitmethod).
Reference:
- Scikit-learn
- onlineLDA
- "Online Learning for Latent Dirichlet Allocation", Matthew D. Hoffman, David M. Blei, Francis Bach
Others:
- Another HDP implementation can be found it my bnp repository. It also follows scikit-learn API and is optimized with cython.