THULAC icon indicating copy to clipboard operation
THULAC copied to clipboard

An Efficient Lexical Analyzer for Chinese

Results 31 THULAC issues
Sort by recently updated
recently updated
newest added

内存使用太大 model文件没有做压缩,太大 内存使用太大:一下子申请很大内存。 可以考虑对model文件压缩(我对cws_dat压缩后只有18M,原来大约60M),然后边使用边解压。这样不model文件和内存都可以降低很多。

When I give an empty file to the program train_c, I found this issue: ``` ================================================================= ==19071==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x7f51e6a9b800 #0 0x7f51e5ac4b2a in...

When I try to run thulac and thulac_test program. I found this : ``` ASAN:SIGSEGV ================================================================= ==12976==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fbf3a4841ba bp 0x000000000000 sp 0x7ffc53739440 T0)...

When I try to run program train_c with the command line : ``` ./train_c train_file outfile ``` The address sanitizer found a heap buffer overflow issue: ``` ================================================================= ==11181==ERROR: AddressSanitizer:...

hi,请问现在支持中英文混合切词的么?我这里测试是没有正确切开的。 In [4]: for t, f in seg.cut('this is a test sentence. 这个是计算广告的数据啊'): ...: print('%s %s' % (t, f)) ...: this x v i g s g g a g...

``` #1.1.命令格式 C++版 ./thulac [-t2s] [-seg_only] [-deli delimeter] [-user userword.txt] 从命令行输入输出 ./thulac [-t2s] [-seg_only] [-deli delimeter] [-user userword.txt] [-intput inputfile] [-output outputfile] 从文本文件输入输出(注意均为UTF8文本) ``` 发现C++版本README.md及程序提示中的一处错误,输入文件的参数应为“ -input”,而非“intput”。

在使用thulac测试的时候 报错cws_label.txt是什么内容?为何文档里没有说明

类似的,那个THULAC.so编译后,python使用时,也报Segmentation fault (core dumped)