augmenty
augmenty copied to clipboard
Augmenty is an augmentation library based on spaCy for augmenting texts.
Following the PR: #31 We should add a tutorial on how to train using augmenty.
The following is a list of potentially new augmenters. If you wish a specific augmenter to be added before others please update the issue corresponding to the augmenter (if it...
It currently seems like the best solution for badged augmentation is to apply it directly to the corpus (maybe using a custom data loader.)
- [ ] Add entity, names ... (check NL augment) - [ ] https://github.com/GEM-benchmark/NL-Augmenter/tree/main/nlaugmenter/transformations/gender_culture_diverse_name_two_way
Augmentation can be used to oversample a category. Imagined usage would look something like this: ``` aug = augmenty.load(...) def is_positive(example): """return true if the example contains an entity""" if...
Add sampling of entities (such as names or adresses) from https://faker.readthedocs.io/en/master/locales/da_DK.html. This tool supports random sampling of entities for numerous of languages.
Augmenting of a document using back translation of various languages e.g., using huggingface models: https://huggingface.co/models?pipeline_tag=translation. Example blog: https://dzlab.github.io/dltips/en/pytorch/text-augmentation/ **Example sentence:** Augmenty is an augmentation library based on spaCy for augmenting...
SpaCy includes a more deliberate sense2vec extension, which might get better word replacements than the word embedding replace.
Current entity augmenters do not handle entity links as one would use for entity linking. Ideally, entity formatters should keep the same link, while entity replacers could potentially add a...