twewy-discord-chatbot
twewy-discord-chatbot copied to clipboard
Type error running "main(trn_df, val_df)"
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
10 frames /usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_fast.py in _batch_encode_plus(self, batch_text_or_text_pairs, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose) 427 ) 428 --> 429 encodings = self._tokenizer.encode_batch( 430 batch_text_or_text_pairs, 431 add_special_tokens=add_special_tokens,
TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]`
**Im working with a "," seperated CSV file. all the previous steps where completed fine. the CSV file is using about 20.000 lines and i've tried using DialoGTP small and medium.
Any help would be much apreciated!**
Hello there! I ran into a similar issue.
It seems the "TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]`" thing appears if it finds empty cells inside of the CSV file.
I'd recommend checking for these empty lines and either delete them or add text to them
I also ran into this, and my fix was to use str(x)
instead of x
in tokenizer.encode(x)
inside function construct_conv()
str(x)
This is really helpful! Use this berfore the 'CHARACTER_NAME' line when anyone has the same issue.
data.dropna(inplace=True) data=data.reset_index(drop=True)