norwegian-nlp-resources icon indicating copy to clipboard operation
norwegian-nlp-resources copied to clipboard

Norwegian NLP Resources

Norwegian NLP Resources

A work-in-progress list of useful NLP resources for Norwegian.

Please let us know if there are useful NLP resources we might have missed!

Contact me at [email protected]

Facebook Group

Join our Facebook Group: https://www.facebook.com/groups/nlpnorway/

Open Source Libraries

Libraries with support for the Norwegian language

Spacy

  • https://spacy.io/models/nb - Official support for Norwegian from Spacy(2.2.0)
  • https://github.com/web64/spacy-norwegian - Train norwegian models for Spacy
  • https://github.com/jarib/spacy-nb - Scripts to build a Norwegian model for spacy
  • https://github.com/ohenrik/nb_news_ud_sm - Experimental Norwegian (Bokmål) language model for Spacy (Including NER)
  • https://github.com/ohenrik/nb_dep_ud_sm - Experimental Norwegian (Bokmål) language model for Spacy
  • https://github.com/navikt/ai-lab-spacy-bokmaal - Norwegian model for spaCy

BERT

  • https://github.com/NBAiLab/notram - NoTraM - Norwegian Transformer Mode
  • http://wiki.nlpl.eu/Vectors/norlm/norbert - NorBERT: Bidirectional Encoder Representations from Transformers
  • https://github.com/botxo/nordic_bert - Nordic BERT: Norwegian Model: (Trained on 4.5gb text)

NLTK

Models

  • https://github.com/explosion/spacy-models/releases/tag/nb_core_news_sm-2.2.0 - Pretrained statistical models for Norwegian Bokmål
  • https://github.com/ljos/navnkjenner - Named-Entity Recognition for Norwegian Bokmål and Nynorsk
  • https://github.com/HIT-SCIR/ELMoForManyLangs - Pre-trained ELMo Representations
  • https://github.com/ltgoslo/norec-baselines - NoReC baseline models, trained on the NoReC dataset.
  • https://github.com/tensorflow/models/blob/master/syntaxnet/g3doc/universal.md - Syntaxnet models
  • https://github.com/andrely/Norwegian-NLP-models - 2013
  • https://github.com/emanlapponi/norlem-norwegian-lemmatizer - Lemmatizer for Norwegian that uses lexical and contextual information from the Norwegian Dependency Treebank (NDT)
  • https://stanfordnlp.github.io/stanfordnlp/installation_download.html#human-languages-supported-by-stanfordnlp - StanfordNLP Pretrained models: Bokmål, Nynorsk, NynorskLIA
  • https://github.com/mollerhoj/Scandinavian-ULMFiT - The weights for the embedding layer of a Scandinavian UMLFiT language models

Word Vectors

  • http://vectors.nlpl.eu/repository/ - NLPL word embeddings repository
  • https://github.com/bheinzerling/bpemb - GloVe word vectors based on Byte-Pair Encoding (BPE)
  • https://github.com/Kyubyong/wordvectors - Word2Vec & fastText word vectors for bokmål and nynorsk.
  • https://fasttext.cc/docs/en/crawl-vectors.html - fastText word vectors trained on common crawl and wikipedia.

Norwegian specific libraries

  • https://github.com/textlab/mtag - The Oslo-Bergen Multitagger for Norwegian Bokmål and Nynorsk (python)
  • https://github.com/ljos/anna_lyse - Language parser for Norwegian Bokmål and Nynorsk
  • https://github.com/petterhh/ndt-tools - Norwegian Dependency Treebank(NDT) Tools
  • https://github.com/ljos/egennavn - Named-entity chunker for Norwegian
  • https://github.com/noklesta/The-Oslo-Bergen-Tagger - The Oslo Bergen Tagger
  • https://github.com/draperunner/obt - Python library for The Oslo-Bergen Tagger

Universal Dependencies

Data & Corpus

  • https://www.nb.no/sprakbanken/repositorium#ticketsfrom?lang=en&query=alle&tokens=&from=1&size=12&collection=sbr (Språkbankens ressurskatalog) Norwegian N-grams, lexicons, news corpus.
  • https://github.com/ltgoslo/norec - NoReC: The Norwegian Review Corpu
  • https://github.com/ltgoslo/talk-of-norway - Talk of Norway (ToN) dataset, a collection of Norwegian parliament speeches from 1998 to 2016
  • https://github.com/stopwords-iso/stopwords-no - Norwegian stopwords in JSON or txt format
  • https://github.com/ltgoslo/norne - NORwegian Named Entities
  • https://www.sketchengine.eu/notenten-norwegian-corpus/ - noTenTen: Corpus of the Norwegian Web
  • https://github.com/unhammer/fugeord - Fugeord

Sentiment Analysis for Norwegian Text

  • https://www.usit.uio.no/om/organisasjon/itf/ds/faglig/seminarer/spraak-teknologi-betydning/sant.pdf (PDF) SANT: Sentiment Analysis for Norwegian Text
  • http://www.mn.uio.no/ifi/english/research/projects/sant/index.html
  • https://github.com/ltgoslo/norsentlex - NorSentLex: Norwegian sentiment lexicon of positive and negative words
  • https://github.com/olavski/afinn/blob/master/afinn/data/AFINN-no-165.txt - Work-in-progress AFINN Norwegian sentiment lexicon
  • https://github.com/web64/norec-fasttext - Train NoReC FastText Sentiment Analysis models

Machine Translation

  • https://github.com/UKPLab/EasyNMT
  • https://github.com/Animenosekai/translate

Apertium

Main library: https://github.com/apertium/apertium-python

Language model:

  • https://github.com/apertium/apertium-nno-nob
  • https://github.com/apertium/apertium-nno
  • https://github.com/apertium/apertium-nob

English-Norwegian parallel corpus

  • http://data.europa.eu/euodp/en/data/dataset/elrc_1061

Commercial APIs

Dictionaries

Papers

Related Resources


Join our Facebook Group here https://www.facebook.com/groups/nlpnorway/