nlpaug issues

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

1 aug = naw.WordEmbsAug(model_type='word2vec', model_path=model_path, action="insert") 2 augmented_text = aug.augment(text) 3 print("Original:") 4 print(text) 5 print("Augmented Text:") [/usr/local/lib/python3.7/dist-packages/nlpaug/augmenter/word/word_embs.py](https://localhost:8080/#) in __init__(self, model_type, model_path, model, action, name, aug_min, aug_max, aug_p, top_k, n_gram_separator,...

NitishOritro

AbstSummAug doesn't have num_beam parameter

In the notebook with examples https://github.com/makcedward/nlpaug/blob/master/example/textual_augmenter.ipynb nas.AbstSummAug used with num_beam parameter which doesn't exist

Oleg-Litvinov-EPAM

'Word2VecKeyedVectors' object has no attribute 'index_to_key'

3

I am trying to implement the word2vec embedding but I get 'Word2VecKeyedVectors' object has no attribute 'index_to_key' error, I implemented the code just as it is in the repositorty, how...

isilberfin

allow to use word embeddings models with different formats

1

Hi, I had some problems when using word embeddings models from other languages due to the file format. Currently, we have only three options of models with fixed format parameters...

IgorMunizS

issue with the input length in "ContextualWordEmbsForSentenceAug" model

1

error/warning :Input length of input_ids is 1836, but ``max_length`` is set to 1200. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``. I am getting this...

musaalamparvez

Use NLP with dataframes and labels

1

Hello I have a similar issue that someone else asked about. I have a dataframe with text column, and classes column. I would like to augment the text column based...

haris525

Unknown Token interfering with stopwords with word augmenter

I think the token [UNK] used for tokens unknown to model interferes with the use of the unknown token to temporarily replace provided stopwords. In example 1, there is one...

vera-bernhard

bug

How to generate more than 1 augmented samples using Back-Translation method?

1

How to generate more than 1 augmented samples using Back-Translation method? Current implementation of `BackTranslationAug` can only support generating a single text, but actually we can change the decoding strategy...

beyondguo

Using custom Tensorflow transformer from HuggingFace

1

I am trying to use `ContextualWordEmbsAug` with a custom BERT model. There are two problems when trying to use a custom transformer that I have trained using the HuggingFace API....

sandstorm12

added Electra to contextual wordaug models

Hi, I added Electra to the list of models that can be used for Contextual Word Embeddings Augmentation. Electra has the same special tokens as BERT, so really just copied...

EvanUp

nlpaug
nlpaug copied to clipboard

Metadata

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

AbstSummAug doesn't have num_beam parameter

'Word2VecKeyedVectors' object has no attribute 'index_to_key'

allow to use word embeddings models with different formats

issue with the input length in "ContextualWordEmbsForSentenceAug" model

Use NLP with dataframes and labels

Unknown Token interfering with stopwords with word augmenter

How to generate more than 1 augmented samples using Back-Translation method?

Using custom Tensorflow transformer from HuggingFace

added Electra to contextual wordaug models

← Metadata

Owner

Metadata

nlpaug nlpaug copied to clipboard

Metadata

← Metadata

Owner

Metadata

nlpaug
nlpaug copied to clipboard