Edward Ma comments

Results 27 comments of


                                            Edward Ma

How to specify the percent of words in the sentence that will be changed?

For synonym, you may try the following code which leveraging WordNet/ PPDB (you may download PPDB source from [here](http://paraphrase.org/#/download))as backbone. On the other hand, you may consider using a deep...

allow to use word embeddings models with different formats

Can you add some test cases for that?

How to generate more than 1 augmented samples using Back-Translation method?

Considered the unexpected sequence problem. If input is list (e.g. ["a", "B", "c"], it will only augmented 1 record per input.

Part of Speach to map synonyms

Right now, it does not cater part of speech. The assumption is that those models should able to take care of it

Use NLP with dataframes and labels

Consider to use the following sample code ``` aug_data = [] for group, d in mydataframe.groupby(['class']): a_data = aug_wordnet.augment(d["your column"].tolist()) a_data = pd.DataFrame(aug_data, columns=['text']) a_data['class'] = class aug_data.append(a_data) aug_data =...

Do you plannig to do 2 words synonyms augmentation

Normalization (e.g. testing --> test) may introduce grammar mistakes. It may not good for Synonym Augmenter. May you suggest any use for this augmenter?

Error in using Glove embeddings - load_word2vec_format() got an unexpected keyword argument 'no_header'

which nlpaug version and gensim version are you using? I am using gensim 4.1.2

Augmenting a sentence is not persisting whitespaces between certain punctuation

By default, the tokenizer intends to handle as you mentioned. You may check [here ](https://github.com/makcedward/nlpaug/blob/master/nlpaug/util/text/tokenizer.py#L25)for implementation. If it does not match your use case, you can pass custom tokenzier and...

Integration with Flair

Let discuss in the above thread

what's the probability threshold of ContextualWordEmbsAug?

There is no threshold. As long as it is replaced by [MASK], it will replace by any other word candidates unless there are no candidates.