pkuseg-python icon indicating copy to clipboard operation
pkuseg-python copied to clipboard

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

Results 115 pkuseg-python issues
Sort by recently updated
recently updated
newest added

实测当中,很多汉字数字词汇被识别为名词。望改进

python3.7 导入的时候报错 AttributeError Traceback (most recent call last) in 5 from xgboost import XGBRegressor 6 from sklearn import preprocessing ----> 7 import pkuseg 8 import re 9 from sklearn.feature_selection import...

我发现pkuseg处理地址信息分词的时候经常会把数字、字母和汉字分成了词组, 我希望添加规则,数字只能和号、弄、楼、室组成词组,和别的汉字不能组成词组。 请问有没有方法可以做到?非常迫切需要学会添加pkuseg分词规则的方法。 这种分词规则:例如1号、2号...9999号,都可以组词,1弄、2弄...9999弄都可以组词,如果用加字典的处理的话不太现实。

可否共享一下训练语料?

win10 64 python=3.6 numpy=1.18.2 pkuseg=0.022 通过pip安装 完整的st如下: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 14 from sklearn.svm import LinearSVC 15 import pickle ---> 16 import pkuseg ~\AppData\Local\Programs\Python\Python36\lib\site-packages\pkuseg\__init__.py in 13...

` File "D:\ProgramData\Anaconda3\envs\NLP_DEMO\lib\site-packages\urllib3\response.py", line 307, in _error_catcher raise ReadTimeoutError(self._pool, None, 'Read timed out.') urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='github-production-release-asset-2e65be.s3.amazonaws.com', port=443): Read timed out. `

Python 3.7.4 (default, Aug 13 2019, 20:35:49) Type 'copyright', 'credits' or 'license' for more information IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import pkuseg...