easy_seq2seq
easy_seq2seq copied to clipboard
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 1: invalid start byte
martin@ubuntu:~/Downloads/easy_seq2seq-master$ python execute.py
>> Mode : train
Preparing data in working_dir/
Tokenizing data in data/test.enc
Traceback (most recent call last):
File "execute.py", line 303, in <module>
train()
File "execute.py", line 117, in train
enc_train, dec_train, enc_dev, dec_dev, _, _ = data_utils.prepare_custom_data(gConfig['working_directory'],gConfig['train_enc'],gConfig['train_dec'],gConfig['test_enc'],gConfig['test_dec'],gConfig['enc_vocab_size'],gConfig['dec_vocab_size'])
File "/home/martin/Downloads/easy_seq2seq-master/data_utils.py", line 147, in prepare_custom_data
data_to_token_ids(test_enc, enc_dev_ids_path, enc_vocab_path, tokenizer)
File "/home/martin/Downloads/easy_seq2seq-master/data_utils.py", line 125, in data_to_token_ids
normalize_digits)
File "/home/martin/Downloads/easy_seq2seq-master/data_utils.py", line 104, in sentence_to_token_ids
words = basic_tokenizer(sentence)
File "/home/martin/Downloads/easy_seq2seq-master/data_utils.py", line 51, in basic_tokenizer
word = str.encode(space_separated_fragment)
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 1: invalid start byte
then i change from python 2.7 to python3
martin@ubuntu:~/Downloads/easy_seq2seq-master$ python3 execute.py
Mode : train
Preparing data in working_dir/
Tokenizing data in data/test.dec
Creating 3 layers of 256 units.
WARNING:tensorflow:At least two cells provided to MultiRNNCell are the same object and will share weights.
Traceback (most recent call last):
File "execute.py", line 301, in