Edward Ma comments

Results 27 comments of


                                            Edward Ma

Guide for NER Augmentation

This library does not support generate augmented data for NER problem yet. I can enhance it if there are any research paper related this problem

Guide for NER Augmentation

Thanks for your contribution. Please share corresponding papers to me. So, I can check out whether it can be supported or not.

Guide for NER Augmentation

@Zylatis Thank you for your input. DSL can be one of the solution for that. Will further design how can nlpaug support DSL. Before that, you may consider to leverage...

Proposal to integrate into Hugging Face Hub

Thank you Omar. Will review the approach

Cannot use any of the Russian BERT models below

Is this exception? If yes, It will be fixed soon `AttributeError: 'RobertaTokenizerFast' object has no attribute '_convert_token_to_id'`

What's the SNR for noise perturbation?

No measurement is conducted as it is controlled by users. Users can control the range, strength and types

issue with the input length in "ContextualWordEmbsForSentenceAug" model

Length is calculated based after tokenziation. So, it will be longer than len(text.split()) )

Is it possible to use character aug with all the action (substitute, inset, delete, swap at the same time?)

Technically, it is doable. However, why do you want to generate this kind of synthetic data? Generating meaningful synthetic data helps to improve model performance. You may use Flow and...

'str' object has no attribute 'w2v' when model='fasttext'

"model" parameter is designed to reload nlpaug's loaded model but not pre-trianed model. Suggest using the following code. ``` aug = naw.WordEmbsAug(model_type='fasttext', model_path='train_ft.bin', action="substitute") augmented_text = aug.augment(text) ```

Chinese augmentation support

I did not try it yet. But I think it is doable given that you have corresponding model and tokenizer