THULAC-Python icon indicating copy to clipboard operation
THULAC-Python copied to clipboard

导入外部词典的问题

Open djblovecxc opened this issue 7 years ago • 4 comments

我导入了外边词典,然后说我格式不对 所以自己把f = open(filename, "r")中的r修改成了rb ,但是心的问题出现了如下: File "D:\Python\lib\site-packages\thulac\manage\Postprocesser.py", line 20, in init dm.makeDat(lexicon, 0) File "D:\Python\lib\site-packages\thulac\base\Dat.py", line 220, in makeDat base = self.assign(0, children, True) File "D:\Python\lib\site-packages\thulac\base\Dat.py", line 196, in assign base = self.alloc(offsets) File "D:\Python\lib\site-packages\thulac\base\Dat.py", line 159, in alloc while (2 * (base + ord(offsets[size - 1])) >= self.datSize): TypeError: ord() expected string of length 1, but int found 请问是什么问题导致的,外部词典是一个utf-8的文件,格式如下:

罗氏婴儿配方粉 n 挂花大头菜 n 黄毛籽 n 青豆 n 儿童营养饼干 n 汤菜 n

djblovecxc avatar Jul 12 '17 05:07 djblovecxc

知道怎么回事了,还是编码格式的问题,我把文件转成gbk 就没问题了 但是不能自定义词性吗?

djblovecxc avatar Jul 12 '17 06:07 djblovecxc

请问如何导入外部词典,官方文档啥也不说

jiangchao123 avatar Aug 07 '17 10:08 jiangchao123

您好,在1.2 接口参数部分有说明

MaJunhua avatar Aug 10 '17 01:08 MaJunhua

@jiangchao123 https://github.com/thunlp/THULAC-Python 首页有说啊

djblovecxc avatar Aug 10 '17 08:08 djblovecxc