twewy-discord-chatbot icon indicating copy to clipboard operation
twewy-discord-chatbot copied to clipboard

Type error running "main(trn_df, val_df)"

Open jdakillah opened this issue 2 years ago • 3 comments

`--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in ----> 1 main(trn_df, val_df)

10 frames /usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_fast.py in _batch_encode_plus(self, batch_text_or_text_pairs, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose) 427 ) 428 --> 429 encodings = self._tokenizer.encode_batch( 430 batch_text_or_text_pairs, 431 add_special_tokens=add_special_tokens,

TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]`

**Im working with a "," seperated CSV file. all the previous steps where completed fine. the CSV file is using about 20.000 lines and i've tried using DialoGTP small and medium.

Any help would be much apreciated!**

jdakillah avatar Jan 16 '23 18:01 jdakillah

Hello there! I ran into a similar issue.

It seems the "TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]`" thing appears if it finds empty cells inside of the CSV file.

I'd recommend checking for these empty lines and either delete them or add text to them

LolGamer1210 avatar Jan 18 '23 22:01 LolGamer1210

I also ran into this, and my fix was to use str(x) instead of x in tokenizer.encode(x) inside function construct_conv()

image

minhcrafters avatar May 28 '23 14:05 minhcrafters

str(x)

This is really helpful! Use this berfore the 'CHARACTER_NAME' line when anyone has the same issue.

data.dropna(inplace=True) data=data.reset_index(drop=True)

image

villinale avatar Aug 21 '23 09:08 villinale