gpt-2-simple
gpt-2-simple copied to clipboard
UnicodeEncodeError: 'charmap' codec can't encode character. character maps to <undefined>
I'm having the error when gpt2.finetune tries to generate samples. Dataset loading is fine (which was the issue on #9)
This is the complete error text:
Traceback (most recent call last):
File ".\persian.py", line 39, in <module>
save_every=500
File "E:\ai\GPT-2\envs\lib\site-packages\gpt_2_simple\gpt_2.py", line 331, in finetune
generate_samples()
File "E:\ai\GPT-2\envs\lib\site-packages\gpt_2_simple\gpt_2.py", line 306, in generate_samples
fp.write('\n'.join(all_text))
File "E:\ai\GPT-2\envs\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0631' in position 28: character maps to <undefined>
the files are encoded in utf-8 and LF line ending
It seems to fail when it tries to save the generated samples into a file
Defining the encoding in Line 303 in gpt_2.py seems to have fixed the issue
with open(
os.path.join(SAMPLE_DIR, run_name,
'samples-{}').format(counter), 'w', encoding='utf8', errors='ignore') as fp:
fp.write('\n'.join(all_text))
Thanks @Syn08 -- this fix should be merged upstream if possible!
i still had to manually edit the code to fix this, was never merged
i still had to manually edit the code to fix this, was never merged
wasn't it fixed on #290?
i still had to manually edit the code to fix this, was never merged
wasn't it fixed on #290?
It was merged to master but not released. The current latest version 0.8.1 does not have this fix included.
I've run into this issue a few times, I saw the related code while I was trying to find the issue but never thought to see if it was released.