nlp_chinese_corpus
nlp_chinese_corpus copied to clipboard
json转换
json文件里存的的是unicode编码 "text":"\u30a2\u30d5\u30ea\u30ab \u30a2\u30d5\u30ea\u30ab\uff08\u82f1\u00a0:
lines1 = f1.read()
lines1 = lines1 .encode('utf-8').decode("unicode_escape")
print(path1+':'+line)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 118-119: surrogates not allowed
这个错误怎么解决?