THULAC-Python icon indicating copy to clipboard operation
THULAC-Python copied to clipboard

An Efficient Lexical Analyzer for Chinese

Results 87 THULAC-Python issues
Sort by recently updated
recently updated
newest added

在windows上,2.4G的CPU,分词+词性标注,30M的文本,跑了几个小时

Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\thulac\__init__.py", line 58, in __init__ self.__tagging_decoder.init((self.__prefix+"model_c_model.bin"),(self.__prefix+"model_c_dat.bin"),(self.__prefix+"model_c_label.txt")) File "C:\Python27\lib\site-packages\thulac\character\CBTaggingDecoder.py", line 36, in init self.model = CBModel(modelFile) File "C:\Python27\lib\site-packages\thulac\character\CBModel.py", line 58,...

when I try thulac.cut($sentence), error jump out: ``` tmp, tagged = self.__tagging_decoder.segmentTag(raw, __poc_cands) start = time.clock() AttributeError: module 'time' has no attribute 'clock' ``` it turns out that function time.clock()...

查了下,不支持python3.8及更高版本,需要手动降级到3.7以下,建议直接修复一下,很简单的改动

请问新版是取消了 -deli 参数吗? -deli delimeter 设置词与词性间的分隔符,默认为下划线_

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa8 in position 0: incomplete multibyte sequence

问题在于start = time.clock()这行代码调用time.clock()这个已经不被支持的函数,后来仔细一看,start这个变量被赋值后没有用过,也就是说这是一个没用的变量,把这行删了之后可以在python3.9.5+Windows环境下正常运行。

在Python 3.8中,time.clock()已经被移除了。但是切割句子时仍然使用了这个。

Win11下,python命令行模式,不知道为什么要求使用GBK(试过Linux是用UTF8正常的),结果也需要切换到GBK才能正常阅读