gpt-2-simple icon indicating copy to clipboard operation
gpt-2-simple copied to clipboard

UnicodeEncodeError: 'charmap' codec can't encode character. character maps to <undefined>

Open ahmadalli opened this issue 5 years ago • 7 comments

I'm having the error when gpt2.finetune tries to generate samples. Dataset loading is fine (which was the issue on #9)

This is the complete error text:

Traceback (most recent call last):
  File ".\persian.py", line 39, in <module>
    save_every=500
  File "E:\ai\GPT-2\envs\lib\site-packages\gpt_2_simple\gpt_2.py", line 331, in finetune
    generate_samples()
  File "E:\ai\GPT-2\envs\lib\site-packages\gpt_2_simple\gpt_2.py", line 306, in generate_samples
    fp.write('\n'.join(all_text))
  File "E:\ai\GPT-2\envs\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0631' in position 28: character maps to <undefined>

ahmadalli avatar May 19 '20 03:05 ahmadalli

the files are encoded in utf-8 and LF line ending

ahmadalli avatar May 19 '20 04:05 ahmadalli

It seems to fail when it tries to save the generated samples into a file

Defining the encoding in Line 303 in gpt_2.py seems to have fixed the issue

    with open(
            os.path.join(SAMPLE_DIR, run_name,
                         'samples-{}').format(counter), 'w', encoding='utf8', errors='ignore') as fp:
        fp.write('\n'.join(all_text))

syn-chromatic avatar Jul 10 '20 13:07 syn-chromatic

Thanks @Syn08 -- this fix should be merged upstream if possible!

axfelix avatar Aug 18 '20 01:08 axfelix

i still had to manually edit the code to fix this, was never merged

FlashlightET avatar Jul 31 '22 02:07 FlashlightET

i still had to manually edit the code to fix this, was never merged

wasn't it fixed on #290?

ahmadalli avatar Aug 01 '22 07:08 ahmadalli

i still had to manually edit the code to fix this, was never merged

wasn't it fixed on #290?

It was merged to master but not released. The current latest version 0.8.1 does not have this fix included.

Flightkick avatar Aug 11 '22 11:08 Flightkick

I've run into this issue a few times, I saw the related code while I was trying to find the issue but never thought to see if it was released.

Technerder avatar Aug 12 '22 17:08 Technerder