nlp-topic-models
nlp-topic-models copied to clipboard
Application of topic models for topic extraction and similarity search
Natural Language Processing (NLP) using Topic Modeling
Application of topic model with special focus on German texts.
Datasets:
- German Political Speeches
TODOOffenes ParlamentTODOProject GutenbergTODOGerman news articlesTODOGerman Wikipedia articles
Algorithms:
TODOLSI - Latent Semantic Indexing (SVD)- LDA - Latent Dirichlet Allocation
TODONMF - Non-negative Matrix Factorization
Tools:
- Gensim
- Mallet
TODOldaTODONLTKTODOsklearnTODOBigARTMTODOVowpal Wabbit (Online LDA)TODOtmtoolkitTODOtcma
Useful and inspirational resources
Topic Modeling Tutorials
About: Building, Evaluating, Visualizing Topic Models
- Gensim Tutorials
- Topics and Transformations
- Tutorial on Mallet in Python (2014-03-20)
- Mallet
- pyLDAvis Library
- http://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/master/notebooks/pyLDAvis_overview.ipynb
- Machine Learning Plus Tutorials (Topic Modeling, NLP)
- Topic modeling visualization – How to present the results of LDA models? (2018-12-04)
- LDA in Python – How to grid search best topic models? (2018-04-04)
- Topic Modeling with Gensim (2018-03-26)
- Lemmatization Approaches with Examples in Python (2018-10-02)
- Gensim Tutorial
- Data Science Plus Tutorials
- Topic Modeling in Python with NLTK and Gensim (2018-04-26)
- Evaluation of Topic Modeling: Topic Coherence (2018-05-03)
- Towards Data Science
- WZB Data Science Blog (NLP)
Topic Models applied on Wikipedia
- https://radimrehurek.com/gensim/wiki.html
- https://www.kdnuggets.com/2017/11/building-wikipedia-text-corpus-nlp.html
Other NLP
- https://github.com/adbar/German-NLP
Research
Data Sources
- Link List - Wissenschaftszentrum Berlin für Sozialforschung
- Link List - Institut für deutsche Sprache und Linguistik (HU Berlin)
- POLLUX - Informationsdienst Politikwissenschaft
- German Microdata Lab (gesis)
- Leipzig Corpora Collection
- DWDS Corpora
Bibliography
LDA
- David M. Blei, Andrew Y. Ng, Michael I. Jordan. Latent Dirichlet Allocation. In: Journal of Machine Learning Research, 2003
Sentiment
- R. Remus, U. Quasthoff & G. Heyer: SentiWS - a Publicly Available German-language Resource for Sentiment Analysis. In: Proceedings of the 7th International Language Ressources and Evaluation (LREC'10), pp. 1168-1171, 2010