easy_seq2seq icon indicating copy to clipboard operation
easy_seq2seq copied to clipboard

Unicode Error

Open karamesh opened this issue 8 years ago • 14 comments

This is the error i get when i try to train the model UnicodeDecodeError: 'ascii' codec can't decode byte 0x97 in position 4: ordinal not in range(128)

karamesh avatar Jan 28 '17 13:01 karamesh

same problem

randomrandom avatar Feb 03 '17 10:02 randomrandom

I think there might be a python version problem - which version are you on?

vivekkalyanarangan30 avatar Feb 03 '17 10:02 vivekkalyanarangan30

@vivekkalyanarangan30 same problem here with python 2.7.12

legentz avatar Feb 03 '17 12:02 legentz

I think author wrote code with 3.X They have fundamentally different encoding implementations... Try firing up a quick 3.X docker with tensorflow and try running this

On Feb 3, 2017 17:48, "Leandro Gentili" [email protected] wrote:

@vivekkalyanarangan30 https://github.com/vivekkalyanarangan30 same problem here with python 2.7.12

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/suriyadeepan/easy_seq2seq/issues/31#issuecomment-277234123, or mute the thread https://github.com/notifications/unsubscribe-auth/AMRIn9SV7MZ00biRxw3YF_4yaQ3SbphAks5rYxsDgaJpZM4LwgJC .

vivekkalyanarangan30 avatar Feb 03 '17 12:02 vivekkalyanarangan30

@vivekkalyanarangan30 I've python 3.6 installed too. I've tried and got this similar error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte

I'm considering to change python encoding

legentz avatar Feb 03 '17 12:02 legentz

What I can see from the error- The problem doesnt seem to be encoding. It is decoding. So try decoding it to utf-8. Just put a .decode() wrapper with utf-8 on the problematic part.

On Feb 3, 2017 18:15, "Leandro Gentili" [email protected] wrote:

@vivekkalyanarangan30 https://github.com/vivekkalyanarangan30 I've python 3.6 installed too. I've tried and got this similar error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte

I'm considering to change python encoding

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/suriyadeepan/easy_seq2seq/issues/31#issuecomment-277238807, or mute the thread https://github.com/notifications/unsubscribe-auth/AMRInzqdL8r2E63FybJUUBIKoy-UB6UXks5rYyF_gaJpZM4LwgJC .

vivekkalyanarangan30 avatar Feb 03 '17 12:02 vivekkalyanarangan30

@vivekkalyanarangan30 .decode() didn't work but the problem didn't show up again after emptying the 'working_dir' folder that comes with trained vocabularies.

Both files are Windows-1252 encoded. However, Tensorflow was raising the error while trying to read those files.

legentz avatar Feb 05 '17 10:02 legentz

note i had to convert all files to utf-8 and that also included the file generate (aka data/train.dec.ids20000 as data/train.enc.ids20000) so convert download data first run one and convert train.enc.ids20000 run again and convert train.dec.ids20000 ..

so i can only assume the bit that writes said files needs a tweek.. now "creating 3 layers of 256 units"... mmm so lets see what happens next (many cpu cycles later... :) )

absentdream avatar Feb 09 '17 00:02 absentdream

Hello People,

I am getting the same error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 4: ordinal not in range(128)

OS: Ubuntu 16.04.1 x64 Python: 2.7.12 Tensorflow: 0.12.1 (CPU only)

dragon28 avatar Feb 09 '17 02:02 dragon28

@dragon28 check out my answer and the one from @absentdream!

legentz avatar Feb 09 '17 08:02 legentz

@legentz

Thanks :) :+1:

Manage to fixed those files by using the following command:

iconv -f WINDOWS-1252 -t UTF-8//TRANSLIT old_files -o new_ffiles

dragon28 avatar Feb 09 '17 23:02 dragon28

Hello, I'm having the same issue. I'm running tensorflow version '0.12.0-rc1' on Ubuntu 16.04.

Upon trying to run execute.py on python 2.7.12, I get an error that says:

"UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 4: ordinal not in range(128)"

While trying to run execute.py on python 3.5.2, I get an error that says:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 28: invalid start byte I tried running the code again after emptying the working_dir folder but I'm getting the same error.

Aryal007 avatar Feb 22 '17 11:02 Aryal007

Just convert the files, worked successfully.

musca1997 avatar Mar 13 '17 05:03 musca1997

What files should we have to convert utf-8?

preethamsridhar avatar Aug 09 '18 13:08 preethamsridhar