Edward Ma

Results 27 comments of Edward Ma

This library does not support generate augmented data for NER problem yet. I can enhance it if there are any research paper related this problem

Thanks for your contribution. Please share corresponding papers to me. So, I can check out whether it can be supported or not.

@Zylatis Thank you for your input. DSL can be one of the solution for that. Will further design how can nlpaug support DSL. Before that, you may consider to leverage...

Thank you Omar. Will review the approach

Is this exception? If yes, It will be fixed soon `AttributeError: 'RobertaTokenizerFast' object has no attribute '_convert_token_to_id'`

No measurement is conducted as it is controlled by users. Users can control the range, strength and types

Length is calculated based after tokenziation. So, it will be longer than len(text.split()) )

Technically, it is doable. However, why do you want to generate this kind of synthetic data? Generating meaningful synthetic data helps to improve model performance. You may use Flow and...

"model" parameter is designed to reload nlpaug's loaded model but not pre-trianed model. Suggest using the following code. ``` aug = naw.WordEmbsAug(model_type='fasttext', model_path='train_ft.bin', action="substitute") augmented_text = aug.augment(text) ```

I did not try it yet. But I think it is doable given that you have corresponding model and tokenizer