DaCy icon indicating copy to clipboard operation
DaCy copied to clipboard

DaCy: The State of the Art Danish NLP pipeline using SpaCy

Results 16 DaCy issues
Sort by recently updated
recently updated
newest added

Currently, the model used a lookup-based lemmatization on the training set. This can be improved by adapting the `lemmy` package for v. 3 of SpaCy Another potential solution might be...

enhancement

- [x] ConvBERT small - [x] Ælæctra Cased - [x] Ælæctra uncased - [x] ELECTRA - [x] ConvBERT medium - Won't the trained as the small convBERT did not compete...

enhancement

Add word frequencies estimated from Danish Gigaword as a language resource.

enhancement

Examine a potential discrepancy between spaCy dependency parse and DaCy dependency parse noted by @rdkm89.

enhancement

After removing readability it would be nice with a tutorial on: "Extracting text statistics and readability metrics using DaCy and Textdescriptives" Potentially using the packages to describe the examining the...

enhancement

Potentially using something like: https://www.statestitle.com/resource/using-nlp-bert-to-improve-ocr-accuracy/ Another interesting read might be this blogpost by grammarly: https://www.grammarly.com/blog/engineering/gec-tag-not-rewrite/ Here is might also be relevant to check out grammarly gector: https://github.com/grammarly/gector Potentially also check...

enhancement

For instance using the spacy implementation: https://github.com/TakeLab/spacy-udpipe

enhancement

based on the evaluation by [daLUKE](https://github.com/peleiden/daluke) it might be relevant to add wikiANN and plank as OOB datasets for NER.

enhancement