RocketQA icon indicating copy to clipboard operation
RocketQA copied to clipboard

执行:python index.py zh ../data/dureader.para test_index 报UnicodeDecodeError错

Open jamyriver opened this issue 1 year ago • 1 comments

image

执行:python index.py zh ../data/dureader.para test_index 报错: ****\anaconda3\envs\my_paddlenlp\lib\site-packages\pkg_resources_init_.py:121: DeprecationWarning: pkg_resources is deprecated as an API warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning) ****\anaconda3\envs\my_paddlenlp\lib\site-packages\pkg_resources_init_.py:2870: DeprecationWarning: Deprecated call to pkg_resources.declare_namespace('mpl_toolkits'). Implementing implicit namespace packages (as specified in PEP 420) is preferred to pkg_resources.declare_namespace. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(pkg) ****\anaconda3\envs\my_paddlenlp\lib\site-packages\pkg_resources_init_.py:2870: DeprecationWarning: Deprecated call to pkg_resources.declare_namespace('google'). Implementing implicit namespace packages (as specified in PEP 420) is preferred to pkg_resources.declare_namespace. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(pkg) Traceback (most recent call last): File "index.py", line 41, in for line in open(data_file): UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 4: illegal multibyte sequence

jamyriver avatar Jun 08 '23 11:06 jamyriver

自己解决了, 把 index.py 的 41 行改为:for line in open(data_file,"r", encoding="UTF-8", errors="ignore"):

jamyriver avatar Jun 09 '23 03:06 jamyriver