Edward Ma
Edward Ma
For synonym, you may try the following code which leveraging WordNet/ PPDB (you may download PPDB source from [here](http://paraphrase.org/#/download))as backbone. On the other hand, you may consider using a deep...
Can you add some test cases for that?
Considered the unexpected sequence problem. If input is list (e.g. ["a", "B", "c"], it will only augmented 1 record per input.
Right now, it does not cater part of speech. The assumption is that those models should able to take care of it
Consider to use the following sample code ``` aug_data = [] for group, d in mydataframe.groupby(['class']): a_data = aug_wordnet.augment(d["your column"].tolist()) a_data = pd.DataFrame(aug_data, columns=['text']) a_data['class'] = class aug_data.append(a_data) aug_data =...
Normalization (e.g. testing --> test) may introduce grammar mistakes. It may not good for Synonym Augmenter. May you suggest any use for this augmenter?
which nlpaug version and gensim version are you using? I am using gensim 4.1.2
By default, the tokenizer intends to handle as you mentioned. You may check [here ](https://github.com/makcedward/nlpaug/blob/master/nlpaug/util/text/tokenizer.py#L25)for implementation. If it does not match your use case, you can pass custom tokenzier and...
Let discuss in the above thread
There is no threshold. As long as it is replaced by [MASK], it will replace by any other word candidates unless there are no candidates.