MuGo icon indicating copy to clipboard operation
MuGo copied to clipboard

handicap=int(sgf_prop(props.get('HA', [0]))) ValueError: invalid literal for int() with base 10: '吴受先'

Open greatken999 opened this issue 8 years ago • 5 comments

when i run "python3 main.py preprocess data/other/tmp/wuqingyuan/" get this error info: 366 sgfs found. Estimated number of chunks: 17 Traceback (most recent call last): File "main.py", line 94, in argh.dispatch(parser) File "/usr/local/lib/python3.5/dist-packages/argh/dispatching.py", line 174, in dispatch for line in lines: File "/usr/local/lib/python3.5/dist-packages/argh/dispatching.py", line 277, in _execute_command for line in result: File "/usr/local/lib/python3.5/dist-packages/argh/dispatching.py", line 260, in _call result = function(*positional, **keywords) File "main.py", line 49, in preprocess test_chunk, training_chunks = parse_data_sets(*data_sets) File "/mnt/ken-volume/MuGo/load_data_sets.py", line 140, in parse_data_sets test_chunk, training_chunks = split_test_training(positions_w_context, est_num_positions) File "/mnt/ken-volume/MuGo/load_data_sets.py", line 60, in split_test_training positions_w_context = list(positions_w_context) File "/mnt/ken-volume/MuGo/load_data_sets.py", line 52, in get_positions_from_sgf for position_w_context in replay_sgf(f.read()): File "/mnt/ken-volume/MuGo/sgf_wrapper.py", line 124, in replay_sgf handicap=int(sgf_prop(props.get('HA', [0]))), ValueError: invalid literal for int() with base 10: '吴受先'

it's look same sgf file props.ge('HA',[0]) get a string ,not a int.

greatken999 avatar Feb 15 '17 22:02 greatken999

Can you give me an example of the sgf file that it's running into issues on?

I suspect it's an sgf file that violates the standards, so having the file itself would be useful to be able to reproduce and verify the fix.

brilee avatar Feb 22 '17 13:02 brilee

most sgff file use gb18030 codec in china ,so i changed load_data_sets.py :
line 48

wqy00.zip

#with open(file) as f:
 with open(file,'rt',encoding='gb18030',errors='iqnore') as f:

to fix bug 👍 : 366 sgfs found. Estimated number of chunks: 17 Traceback (most recent call last): File "main.py", line 94, in argh.dispatch(parser) File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 174, in dispatch for line in lines: File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 277, in _execute_command for line in result: File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 260, in _call result = function(*positional, **keywords) File "main.py", line 49, in preprocess test_chunk, training_chunks = parse_data_sets(*data_sets) File "/home/ken/ai/go/MuGo/load_data_sets.py", line 140, in parse_data_sets test_chunk, training_chunks = split_test_training(positions_w_context, est_num_positions) File "/home/ken/ai/go/MuGo/load_data_sets.py", line 60, in split_test_training positions_w_context = list(positions_w_context) File "/home/ken/ai/go/MuGo/load_data_sets.py", line 52, in get_positions_from_sgf for position_w_context in replay_sgf(f.read()): File "/usr/lib64/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 5: invalid continuation byte

greatken999 avatar Feb 24 '17 04:02 greatken999

Oh.. ugh, this makes me sad. So, the SGF file should declare that its encoding is GB18030; I can't just assume it. Most western-generated SGFs assume UTF-8, so putting in this new assumption would just break the other half of SGFs.

The other issue is that the HA property should be a number http://www.red-bean.com/sgf/go.html#types , not "Wu played first", even though that was the convention back then. I can't really ask you to go fix whatever SGF editor created these files, though, so I think the best I could do is just have a try-except to try different encodings.

brilee avatar Feb 24 '17 05:02 brilee

Yes,I fix this bug changed sgf_wrapper.py to 👍 try: metadata = GameMetadata( result=sgf_prop(props.get('RE')), handicap=int(sgf_prop(props.get('HA', [0]))), board_size=19)

except: metadata = GameMetadata( result=sgf_prop(props.get('RE')), handicap=0, board_size=19) f=open("./error.txt",'a') traceback.print_exc(file=f) f.flush() f.close()

greatken999 avatar Feb 24 '17 06:02 greatken999

Hi brilee:

encoding bug fixed , tested ok both utf-8 and GB18030 sgf files. need rum "pip3 install cchardet" to install cchardet modulle first

change load_data_sets.py line 48 to: import cchardet as chardet

def get_positions_from_sgf(file): with open(file,'rb') as f: result = chardet.detect(f.read())['encoding'] f.close with open(file,'rt',encoding=result,errors='iqnore') as f:

greatken999 avatar Feb 24 '17 11:02 greatken999