THULAC-Python icon indicating copy to clipboard operation
THULAC-Python copied to clipboard

cannot cut utf-8 input file, but can cut gbk file

Open l1t1 opened this issue 6 years ago • 0 comments

D:\Python35-32>python -m thulac inputu.txt output.txt seg_only Model loaded succeed Traceback (most recent call last): File "D:\Python35-32\lib\runpy.py", line 184, in run_module_as_main "main", mod_spec) File "D:\Python35-32\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "D:\Python35-32\lib\site-packages\thulac_main.py", line 9, in lac.cut_f(sys.argv[1], sys.argv[2]) File "D:\Python35-32\lib\site-packages\thulac_init.py", line 189, in cut_f for line in input_f: UnicodeDecodeError: 'gbk' codec can't decode byte 0xb4 in position 42: illegal multibyte sequence

D:\Python35-32>python -m thulac input.txt output.txt seg_only Model loaded succeed successfully cut file input.txt!

l1t1 avatar Sep 27 '19 12:09 l1t1