MuGo
MuGo copied to clipboard
handicap=int(sgf_prop(props.get('HA', [0]))) ValueError: invalid literal for int() with base 10: '吴受先'
when i run "python3 main.py preprocess data/other/tmp/wuqingyuan/" get this error info:
366 sgfs found.
Estimated number of chunks: 17
Traceback (most recent call last):
File "main.py", line 94, in
it's look same sgf file props.ge('HA',[0]) get a string ,not a int.
Can you give me an example of the sgf file that it's running into issues on?
I suspect it's an sgf file that violates the standards, so having the file itself would be useful to be able to reproduce and verify the fix.
most sgff file use gb18030 codec in china ,so i changed load_data_sets.py :
line 48
#with open(file) as f:
with open(file,'rt',encoding='gb18030',errors='iqnore') as f:
to fix bug 👍 :
366 sgfs found.
Estimated number of chunks: 17
Traceback (most recent call last):
File "main.py", line 94, in
Oh.. ugh, this makes me sad. So, the SGF file should declare that its encoding is GB18030; I can't just assume it. Most western-generated SGFs assume UTF-8, so putting in this new assumption would just break the other half of SGFs.
The other issue is that the HA property should be a number http://www.red-bean.com/sgf/go.html#types , not "Wu played first", even though that was the convention back then. I can't really ask you to go fix whatever SGF editor created these files, though, so I think the best I could do is just have a try-except to try different encodings.
Yes,I fix this bug changed sgf_wrapper.py to 👍 try: metadata = GameMetadata( result=sgf_prop(props.get('RE')), handicap=int(sgf_prop(props.get('HA', [0]))), board_size=19)
except: metadata = GameMetadata( result=sgf_prop(props.get('RE')), handicap=0, board_size=19) f=open("./error.txt",'a') traceback.print_exc(file=f) f.flush() f.close()
Hi brilee:
encoding bug fixed , tested ok both utf-8 and GB18030 sgf files. need rum "pip3 install cchardet" to install cchardet modulle first
change load_data_sets.py line 48 to: import cchardet as chardet
def get_positions_from_sgf(file): with open(file,'rb') as f: result = chardet.detect(f.read())['encoding'] f.close with open(file,'rt',encoding=result,errors='iqnore') as f: