German-NLP icon indicating copy to clipboard operation
German-NLP copied to clipboard

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German Awesome

Resources and tools which can be used either off-the-shelf or with minor adjustments and which are currently maintained are primarily chosen for this list. It is deliberately biased in terms of usability and user-friendliness.

Pull requests and suggestions are welcome! See contributing guidelines.

Table of Contents

  • Text corpora
    • General-purpose
    • Historical
    • Specialized
    • Word lists
    • Data acquisition
    • Lists of corpora
  • Generic resources
    • Frameworks
    • Treebanks
    • Deep learning models and transformers
    • Annotation
    • Standards
  • Linguistic processing
    • Preprocessing
    • Tokenization / Sentence boundary detection
    • Stemming
    • Lemmatization
    • Morphological analysis
    • Normalization
    • Phonology
    • POS-tagging
    • Syntactical parsing
    • Named Entity Recognition
    • Industry/Applications
    • Evaluation
  • Semantic analysis
    • Datasets
    • Word embeddings and senses
    • Sentiment analysis datasets / polarity clues
    • Sentiment detection
    • GermEval
    • Coreference resolution
    • Summarization
    • Psycholinguistics
  • Speech NLP
  • Machine Translation
  • Teaching resources and tutorials
  • More lists
    • German
    • General
    • Comparable lists
    • Larger institutional GitHub groups

Text corpora

General-purpose

Historical

Specialized

Swiss German

Learner and Error Corpora

Word lists

Data acquisition

Lists of corpora

Generic resources

Frameworks

Treebanks

Deep learning models and transformers

Annotation

Standards

Linguistic processing

Preprocessing

Tokenization / Sentence boundary detection

Stemming

Lemmatization

Morphological analysis

Normalization

Phonology

POS-tagging

Syntactical parsing

Named Entity Recognition

Misc

Text generation

Industry/Applications

Evaluation

Semantic analysis

Datasets

Word embeddings and senses

Sentiment analysis datasets / polarity clues

Sentiment detection

GermEval

(category to improve)

Discourse

Summarization

Psycholinguistics

Speech NLP

Machine Translation

(category to improve)

Parallel corpora

Teaching resources and tutorials

More lists

German

General

Comparable lists

Larger institutional GitHub groups

Contributors

See the list of contributors.

License

CC-BY