DaCy issues

Results 16 DaCy issues

Sort by recently updated

Improve lemmatization

Currently, the model used a lookup-based lemmatization on the training set. This can be improved by adapting the `lemmy` package for v. 3 of SpaCy Another potential solution might be...

KennethEnevoldsen

enhancement

New language models to fine-tune

- [x] ConvBERT small - [x] Ælæctra Cased - [x] Ælæctra uncased - [x] ELECTRA - [x] ConvBERT medium - Won't the trained as the small convBERT did not compete...

KennethEnevoldsen

enhancement

Upload names.csv to huggingface model hub

KennethEnevoldsen

enhancement

Add resource: Danish word frequencies DAGW

Add word frequencies estimated from Danish Gigaword as a language resource.

KennethEnevoldsen

enhancement

Examine discrepancy between spaCy dependency parse and DaCy dependency parse

Examine a potential discrepancy between spaCy dependency parse and DaCy dependency parse noted by @rdkm89.

KennethEnevoldsen

enhancement

Add Tutorials: "Extracting text statistics and readability metrics using DaCy and Textdescriptives"

After removing readability it would be nice with a tutorial on: "Extracting text statistics and readability metrics using DaCy and Textdescriptives" Potentially using the packages to describe the examining the...

KennethEnevoldsen

enhancement

Remove matcher from pipeline to avoid raised warning.

KennethEnevoldsen

enhancement

Add a spelling correction module

Potentially using something like: https://www.statestitle.com/resource/using-nlp-bert-to-improve-ocr-accuracy/ Another interesting read might be this blogpost by grammarly: https://www.grammarly.com/blog/engineering/gec-tag-not-rewrite/ Here is might also be relevant to check out grammarly gector: https://github.com/grammarly/gector Potentially also check...

KennethEnevoldsen

enhancement

Add UDPipe to comparisons

For instance using the spacy implementation: https://github.com/TakeLab/spacy-udpipe

KennethEnevoldsen

enhancement

adding wikiANN and plank

based on the evaluation by [daLUKE](https://github.com/peleiden/daluke) it might be relevant to add wikiANN and plank as OOB datasets for NER.

KennethEnevoldsen

enhancement

DaCy
DaCy copied to clipboard

Metadata

Improve lemmatization

New language models to fine-tune

Upload names.csv to huggingface model hub

Add resource: Danish word frequencies DAGW

Examine discrepancy between spaCy dependency parse and DaCy dependency parse

Add Tutorials: "Extracting text statistics and readability metrics using DaCy and Textdescriptives"

Remove matcher from pipeline to avoid raised warning.

Add a spelling correction module

Add UDPipe to comparisons

adding wikiANN and plank

← Metadata

Owner

Metadata

DaCy DaCy copied to clipboard

Metadata

← Metadata

Owner

Metadata

DaCy
DaCy copied to clipboard