ChineseNER 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

Traceback (most recent call last): File "E:\python2.7\pycharm\PyCharm 4.5.5\helpers\pydev\pydevd.py", line 2358, in globals = debugger.run(setup['file'], None, None, is_module) File "E:\python2.7\pycharm\PyCharm 4.5.5\helpers\pydev\pydevd.py", line 1778, in run pydev_imports.execfile(file, globals, locals) # execute the script File "E:\python2.7\pycharm\PyCharm 4.5.5\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "ChineseNER-master/main.py", line 225, in tf.app.run(main) File "tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "ChineseNER-master/main.py", line 219, in main train() File "ChineseNER-master/main.py", line 185, in train best = evaluate(sess, model, "dev", dev_manager, id_to_tag, logger) File "ChineseNER-master/main.py", line 85, in evaluate eval_lines = test_ner(ner_results, FLAGS.result_path) File "ChineseNER-master\utils.py", line 66, in test_ner eval_lines = return_report(output_file) File "ChineseNER-master\conlleval.py", line 282, in return_report counts = evaluate(f) File "ChineseNER-master\conlleval.py", line 74, in evaluate for line in iterable: File "tensorflow\lib\codecs.py", line 713, in next return next(self.reader) File "tensorflow\lib\codecs.py", line 644, in next line = self.readline() File "tensorflow\lib\codecs.py", line 557, in readline data = self.read(readsize, firstline=True) File "tensorflow\lib\codecs.py", line 501, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

我的是tensorflow 1.3版本，请问下大家有没有遇到类似问题？有何解决方法。

Dec 29 '17 03:12 janesunflower

请问你解决了吗？

Jan 02 '18 13:01 lengxia

还没有，悲剧。

Jan 03 '18 00:01 janesunflower

What's wrong with it？My tensorflow is 1.4.

Jan 04 '18 06:01 yyHaker

I found this question can be solved as below:

in utils.py change as follows:

def test_ner(results, path): """ Run perl script to evaluate model """ output_file = os.path.join(path, "ner_predict.utf8") with open(output_file, "w", encoding='utf8') as f: to_write = [] for block in results: for line in block: to_write.append(line + "\n") to_write.append("\n") f.writelines(to_write) eval_lines = return_report(output_file) return eval_lines

The reason is that only when you write the file use "utf8" can you open the file use "utf8", and it have nothing to do with the tensorflow version.

Jan 04 '18 06:01 yyHaker

@yyHaker ,good job, it help me solved this problem,thanks

Jan 04 '18 07:01 lengxia

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

2018-02-02 12:08:27,160 - log\train.log - INFO - iteration:1 step:1000/1044, NER loss: 5.380470 2018-02-02 12:08:36,132 - log\train.log - INFO - evaluate:dev Traceback (most recent call last): 运行到这还是那个编码问题，你们遇到了吗？

Feb 02 '18 05:02 ylwctyt

This is still the encoding problem, you can debug to find the encoding problem

Feb 02 '18 05:02 yyHaker

@yyHaker Thanks!

Feb 09 '18 02:02 SanSLee

This is a encoding problem. If you coding in Linux ,please trans the coding by Notepad++.But ,if you coding in Windows ,Please use this : import codecs with codecs.open(filename, 'r', 'utf-8') as f: #this is your process

Jun 14 '18 09:06 LiXuanming

it is very easy. You just need to change the 'utf-8' to 'gbk' in the 'return_report' of 'utils.py'.

Jul 01 '19 11:07 ghost

ChineseNER ChineseNER copied to clipboard

'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

ChineseNER
ChineseNER copied to clipboard