nlp-data-augmentation
nlp-data-augmentation copied to clipboard
Data Augmentation for NLP. NLP数据增强
NLP Data Augmentaion
Paper
- Unsupervised Data Augmentation
- Unsupervised Question Answering by Cloze Translation
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?
- It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations
Overview
- A Visual Survey of Data Augmentation in NLP
- Task-independent data augmentation for NLP
- Robust, Unbiased Natural Language Processing pdf
Methods
- General
- random insertion, deletion, word, sentence shuffling
- Replacing words with synonyms
- Replace the words from dicitionary of the same label
- Perturbations (letter, word, or sentence level)
- Language model
- Back translation
- Round-trip translation
-
Leverage External Data
- Using external data derived from Wikipedia. linking wikipedia articles to arbitrary input text. The idea is that if the input text were on Wikipedia, it would have links to other Wikipedia articles (that are semantically related and provide additional info).
- break the input text into n-grams
- check whether each n-gram exists as a wikipedia article to create a set of ‘candidate links’
- prune the candidate links by computing the similarity of the input text and the abstract of each candidate
- Using external data derived from Wikipedia. linking wikipedia articles to arbitrary input text. The idea is that if the input text were on Wikipedia, it would have links to other Wikipedia articles (that are semantically related and provide additional info).
- Conversational Systems
- Reading Comprehension