MuGo handicap=int(sgf_prop(props.get('HA', [0]))) ValueError: invalid literal for int() with base 10: '吴受先'

when i run "python3 main.py preprocess data/other/tmp/wuqingyuan/" get this error info: 366 sgfs found. Estimated number of chunks: 17 Traceback (most recent call last): File "main.py", line 94, in argh.dispatch(parser) File "/usr/local/lib/python3.5/dist-packages/argh/dispatching.py", line 174, in dispatch for line in lines: File "/usr/local/lib/python3.5/dist-packages/argh/dispatching.py", line 277, in _execute_command for line in result: File "/usr/local/lib/python3.5/dist-packages/argh/dispatching.py", line 260, in _call result = function(*positional, **keywords) File "main.py", line 49, in preprocess test_chunk, training_chunks = parse_data_sets(*data_sets) File "/mnt/ken-volume/MuGo/load_data_sets.py", line 140, in parse_data_sets test_chunk, training_chunks = split_test_training(positions_w_context, est_num_positions) File "/mnt/ken-volume/MuGo/load_data_sets.py", line 60, in split_test_training positions_w_context = list(positions_w_context) File "/mnt/ken-volume/MuGo/load_data_sets.py", line 52, in get_positions_from_sgf for position_w_context in replay_sgf(f.read()): File "/mnt/ken-volume/MuGo/sgf_wrapper.py", line 124, in replay_sgf handicap=int(sgf_prop(props.get('HA', [0]))), ValueError: invalid literal for int() with base 10: '吴受先'

it's look same sgf file props.ge('HA',[0]) get a string ,not a int.

Feb 15 '17 22:02 greatken999

Can you give me an example of the sgf file that it's running into issues on?

I suspect it's an sgf file that violates the standards, so having the file itself would be useful to be able to reproduce and verify the fix.

Feb 22 '17 13:02 brilee

most sgff file use gb18030 codec in china ,so i changed load_data_sets.py :
line 48

wqy00.zip

#with open(file) as f:
 with open(file,'rt',encoding='gb18030',errors='iqnore') as f:

to fix bug 👍 : 366 sgfs found. Estimated number of chunks: 17 Traceback (most recent call last): File "main.py", line 94, in argh.dispatch(parser) File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 174, in dispatch for line in lines: File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 277, in _execute_command for line in result: File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 260, in _call result = function(*positional, **keywords) File "main.py", line 49, in preprocess test_chunk, training_chunks = parse_data_sets(*data_sets) File "/home/ken/ai/go/MuGo/load_data_sets.py", line 140, in parse_data_sets test_chunk, training_chunks = split_test_training(positions_w_context, est_num_positions) File "/home/ken/ai/go/MuGo/load_data_sets.py", line 60, in split_test_training positions_w_context = list(positions_w_context) File "/home/ken/ai/go/MuGo/load_data_sets.py", line 52, in get_positions_from_sgf for position_w_context in replay_sgf(f.read()): File "/usr/lib64/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 5: invalid continuation byte

Feb 24 '17 04:02 greatken999

Oh.. ugh, this makes me sad. So, the SGF file should declare that its encoding is GB18030; I can't just assume it. Most western-generated SGFs assume UTF-8, so putting in this new assumption would just break the other half of SGFs.

The other issue is that the HA property should be a number http://www.red-bean.com/sgf/go.html#types , not "Wu played first", even though that was the convention back then. I can't really ask you to go fix whatever SGF editor created these files, though, so I think the best I could do is just have a try-except to try different encodings.

Feb 24 '17 05:02 brilee

Yes,I fix this bug changed sgf_wrapper.py to 👍 try: metadata = GameMetadata( result=sgf_prop(props.get('RE')), handicap=int(sgf_prop(props.get('HA', [0]))), board_size=19)

except: metadata = GameMetadata( result=sgf_prop(props.get('RE')), handicap=0, board_size=19) f=open("./error.txt",'a') traceback.print_exc(file=f) f.flush() f.close()

Feb 24 '17 06:02 greatken999

Hi brilee:

encoding bug fixed , tested ok both utf-8 and GB18030 sgf files. need rum "pip3 install cchardet" to install cchardet modulle first

change load_data_sets.py line 48 to: import cchardet as chardet

def get_positions_from_sgf(file): with open(file,'rb') as f: result = chardet.detect(f.read())['encoding'] f.close with open(file,'rt',encoding=result,errors='iqnore') as f:

Feb 24 '17 11:02 greatken999

MuGo MuGo copied to clipboard

handicap=int(sgf_prop(props.get('HA', [0]))) ValueError: invalid literal for int() with base 10: '吴受先'

MuGo
MuGo copied to clipboard