THULAC-Python issues

#如果只需要分词功能，可在增加参数"seg_only" python -m thulac input.txt output.txt seg_only

2

添加了seg_only参数后，生成的文件output.txt仍然具有词性标注，怎么解决？

S-GeGe

Use time.process_time instead of time.clock

1

time.clock deprecated since version 3.3, and has been removed in version 3.8.

nasyxx

建议指定编码方式 encoding='utf-8'

5

https://github.com/thunlp/THULAC-Python/blob/48443efa83412f11c580b683a633c05e445deba1/thulac/manage/Postprocesser.py#L13 Windows 7 + python3.6.2 不指定编码方式，读取utf-8字典文件，会报错 UnicodeDecodeError: 'gbk' codec can't decode byte …… illegal multibyte sequence

jresins

Pcharm Error

thu1 = thulac.thulac() TypeError: 'module' object is not callable 这个怎么解决

Chumbery

README.me里的小错误

自定义设置里面 seg_only 默认False, 时候只进行分词，不进行词性标注应该默认的是True， True是不进行词性标注，False是标注。

lori94

python 默认加载的是lite模型么？

2

如题。python 默认加载的是lite模型么？

joyeJ

可以设置参数使用户可以通过自定义的停用词文件过滤停用词么？

4

我在使用pip安装完python版之后，在参数中使用该设置： -filter 使用过滤器去除一些没有意义的词语，例如“可以”。 ``` thu1 = thulac.thulac(seg_only=True,filt=True) ``` 然而并不能去掉结果中的标点符号以及＂的＂之类的停用词

chenliusuo

MIT license and restriction for commercial use?

The repo indicates MIT license. However, the following restriction is imposed for commercial use. 如有机构或个人拟将THULAC用于商业目的，请发邮件至[email protected]洽谈技术许可协议。 Perhaps it's better to use a more appropriate license than the general MIT one.

ldong87