DaCy
DaCy copied to clipboard
DaCy: The State of the Art Danish NLP pipeline using SpaCy
Currently, the model used a lookup-based lemmatization on the training set. This can be improved by adapting the `lemmy` package for v. 3 of SpaCy Another potential solution might be...
- [x] ConvBERT small - [x] Ælæctra Cased - [x] Ælæctra uncased - [x] ELECTRA - [x] ConvBERT medium - Won't the trained as the small convBERT did not compete...
Add word frequencies estimated from Danish Gigaword as a language resource.
Examine a potential discrepancy between spaCy dependency parse and DaCy dependency parse noted by @rdkm89.
After removing readability it would be nice with a tutorial on: "Extracting text statistics and readability metrics using DaCy and Textdescriptives" Potentially using the packages to describe the examining the...
Potentially using something like: https://www.statestitle.com/resource/using-nlp-bert-to-improve-ocr-accuracy/ Another interesting read might be this blogpost by grammarly: https://www.grammarly.com/blog/engineering/gec-tag-not-rewrite/ Here is might also be relevant to check out grammarly gector: https://github.com/grammarly/gector Potentially also check...
For instance using the spacy implementation: https://github.com/TakeLab/spacy-udpipe
based on the evaluation by [daLUKE](https://github.com/peleiden/daluke) it might be relevant to add wikiANN and plank as OOB datasets for NER.