nlpaug
nlpaug copied to clipboard
Guide for NER Augmentation
Thanks for sharing your work, i could not find Any NLP Augmentation library other than this.
Will this Library help in augmenting NER data?
My data looks like this
Ryan B-PER
Dsouza B-PER
/DOB O
11/11/1997 B-DOB
/MALE O
22 B-NUM
56565 B-NUM
Thanks in advance
This library does not support generate augmented data for NER problem yet.
I can enhance it if there are any research paper related this problem
May be I can help , I have a custom data set for which I need to augmentations, may be you can include that in your library?
On 09-Aug-2019, at 10:18 PM, Edward Ma <[email protected]mailto:[email protected]> wrote:
This library does not support generate augmented data for NER problem yet.
I can enhance it if there are any research paper related this problem
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/makcedward/nlpaug/issues/19?email_source=notifications&email_token=AGD5QFYXJNSPIFNFQM3IJZ3QDWNWRA5CNFSM4IKIUBBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD37GFOQ#issuecomment-519987898, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGD5QF22EFZUNFBKJVIMYXDQDWNWRANCNFSM4IKIUBBA.
Thanks for your contribution.
Please share corresponding papers to me. So, I can check out whether it can be supported or not.
I'm really interested in this as well as I am trying to do NER with a limited data set. I'm not aware of any papers looking at this specifically, but I think it might be interesting to combine it with a data generating DSL like Chattete (I actually asked about the problems nlpaug tackles in this issue! https://github.com/SimGus/Chatette/issues/25)
I think a useful first step might be to just make the substitutions tag-aware, so that you aren't going to do a substitution that changes the tag or something. Potentially you might also want a flag which just prevents substitutions on tagged (i.e. not 'O') words altogether.
This of course presumes the existence of a labelled, if small, dataset, which I think is totally reasonable. I think combining context-aware vector substitutions with a DSL language, and maybe some gazetter pipelines to streamline external inputs, could be really powerful, and a cool project to work on if anyone is interested!
@Zylatis Thank you for your input. DSL can be one of the solution for that. Will further design how can nlpaug support DSL.
Before that, you may consider to leverage "stopwords" attribute to simulate tag-aware behavior. You can change list of stopwords per augmentation.
import nlpaug.augmenter.word as naw
text = "Peter likes dogs"
aug = naw.ContextualWordEmbsAug()
aug.stopwords = ['Peter']
aug.augment(text)
Hi,
even i was looking for this. the above code snippet is helpful for sure.
but there is another use case in which we might want to substitute NER tag with another word.
is there any example for this?
This is a simple custom NER augmenter which might help
https://gist.github.com/manishiitg/8fd4209fcb3c6cb08ed34705c1f32c86
Hi @makcedward @manishiitg , any recent improvements to create NER synthetic data
.
Original_text=`My name is Pratik. I live in India'
Augmented can be:
- `My name is Jon. I live in U.S.A'
- 'My name is Manish. I live in China`