easy_seq2seq
easy_seq2seq copied to clipboard
Unicode Error
This is the error i get when i try to train the model UnicodeDecodeError: 'ascii' codec can't decode byte 0x97 in position 4: ordinal not in range(128)
same problem
I think there might be a python version problem - which version are you on?
@vivekkalyanarangan30 same problem here with python 2.7.12
I think author wrote code with 3.X They have fundamentally different encoding implementations... Try firing up a quick 3.X docker with tensorflow and try running this
On Feb 3, 2017 17:48, "Leandro Gentili" [email protected] wrote:
@vivekkalyanarangan30 https://github.com/vivekkalyanarangan30 same problem here with python 2.7.12
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/suriyadeepan/easy_seq2seq/issues/31#issuecomment-277234123, or mute the thread https://github.com/notifications/unsubscribe-auth/AMRIn9SV7MZ00biRxw3YF_4yaQ3SbphAks5rYxsDgaJpZM4LwgJC .
@vivekkalyanarangan30 I've python 3.6 installed too. I've tried and got this similar error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte
I'm considering to change python encoding
What I can see from the error- The problem doesnt seem to be encoding. It is decoding. So try decoding it to utf-8. Just put a .decode() wrapper with utf-8 on the problematic part.
On Feb 3, 2017 18:15, "Leandro Gentili" [email protected] wrote:
@vivekkalyanarangan30 https://github.com/vivekkalyanarangan30 I've python 3.6 installed too. I've tried and got this similar error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte
I'm considering to change python encoding
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/suriyadeepan/easy_seq2seq/issues/31#issuecomment-277238807, or mute the thread https://github.com/notifications/unsubscribe-auth/AMRInzqdL8r2E63FybJUUBIKoy-UB6UXks5rYyF_gaJpZM4LwgJC .
@vivekkalyanarangan30 .decode()
didn't work but the problem didn't show up again after emptying the 'working_dir' folder that comes with trained vocabularies.
Both files are Windows-1252 encoded. However, Tensorflow was raising the error while trying to read those files.
note i had to convert all files to utf-8 and that also included the file generate (aka data/train.dec.ids20000 as data/train.enc.ids20000) so convert download data first run one and convert train.enc.ids20000 run again and convert train.dec.ids20000 ..
so i can only assume the bit that writes said files needs a tweek.. now "creating 3 layers of 256 units"... mmm so lets see what happens next (many cpu cycles later... :) )
Hello People,
I am getting the same error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 4: ordinal not in range(128)
OS: Ubuntu 16.04.1 x64 Python: 2.7.12 Tensorflow: 0.12.1 (CPU only)
@dragon28 check out my answer and the one from @absentdream!
@legentz
Thanks :) :+1:
Manage to fixed those files by using the following command:
iconv -f WINDOWS-1252 -t UTF-8//TRANSLIT old_files -o new_ffiles
Hello, I'm having the same issue. I'm running tensorflow version '0.12.0-rc1' on Ubuntu 16.04.
Upon trying to run execute.py on python 2.7.12, I get an error that says:
"UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 4: ordinal not in range(128)"
While trying to run execute.py on python 3.5.2, I get an error that says:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 28: invalid start byte I tried running the code again after emptying the working_dir folder but I'm getting the same error.
Just convert the files, worked successfully.
What files should we have to convert utf-8?