gpt-2-simple icon indicating copy to clipboard operation
gpt-2-simple copied to clipboard

can't encode character

Open mrx23dot opened this issue 2 years ago • 2 comments

[898 | 64206.12] loss=0.76 avg=0.84
[899 | 64275.59] loss=0.40 avg=0.84
[900 | 64345.04] loss=0.53 avg=0.83
======== SAMPLE 1 ========

Traceback (most recent call last):
  File "C:\tmp\btc_all\btc_LUT\generators\gen_infintext_gpt2.py", line 27, in <module>
  File "C:\Python37\lib\site-packages\gpt_2_simple\gpt_2.py", line 334, in finetune
    generate_samples()
  File "C:\Python37\lib\site-packages\gpt_2_simple\gpt_2.py", line 309, in generate_samples
    fp.write('\n'.join(all_text))
  File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0e4e' in position 1908: character maps to <undefined>

could solve it by "test".encode("utf-8","ignore")

mrx23dot avatar May 24 '22 09:05 mrx23dot

I fought with this myself. It has to do with the default encoding. As detailed here you can fix it by setting PYTHONUTF8=1 in System Properties > Advanced > Environment Variables

kreas avatar Oct 04 '22 02:10 kreas

I think you need to put these two into the beginning of the file, and save it as utf

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

mrx23dot avatar Oct 04 '22 08:10 mrx23dot