Edward Ma

Results 27 comments of Edward Ma

For synonym, you may try the following code which leveraging WordNet/ PPDB (you may download PPDB source from [here](http://paraphrase.org/#/download))as backbone. On the other hand, you may consider using a deep...

Considered the unexpected sequence problem. If input is list (e.g. ["a", "B", "c"], it will only augmented 1 record per input.

Right now, it does not cater part of speech. The assumption is that those models should able to take care of it

Consider to use the following sample code ``` aug_data = [] for group, d in mydataframe.groupby(['class']): a_data = aug_wordnet.augment(d["your column"].tolist()) a_data = pd.DataFrame(aug_data, columns=['text']) a_data['class'] = class aug_data.append(a_data) aug_data =...

Normalization (e.g. testing --> test) may introduce grammar mistakes. It may not good for Synonym Augmenter. May you suggest any use for this augmenter?

which nlpaug version and gensim version are you using? I am using gensim 4.1.2

By default, the tokenizer intends to handle as you mentioned. You may check [here ](https://github.com/makcedward/nlpaug/blob/master/nlpaug/util/text/tokenizer.py#L25)for implementation. If it does not match your use case, you can pass custom tokenzier and...

Let discuss in the above thread

There is no threshold. As long as it is replaced by [MASK], it will replace by any other word candidates unless there are no candidates.