THULAC
THULAC copied to clipboard
An Efficient Lexical Analyzer for Chinese
neg.dat ns.dat singlepun.dat t2s.dat time.dat xu.dat
内存使用太大 model文件没有做压缩,太大 内存使用太大:一下子申请很大内存。 可以考虑对model文件压缩(我对cws_dat压缩后只有18M,原来大约60M),然后边使用边解压。这样不model文件和内存都可以降低很多。
hello word. ---> hello_x word._x
When I give an empty file to the program train_c, I found this issue: ``` ================================================================= ==19071==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x7f51e6a9b800 #0 0x7f51e5ac4b2a in...
When I try to run thulac and thulac_test program. I found this : ``` ASAN:SIGSEGV ================================================================= ==12976==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fbf3a4841ba bp 0x000000000000 sp 0x7ffc53739440 T0)...
When I try to run program train_c with the command line : ``` ./train_c train_file outfile ``` The address sanitizer found a heap buffer overflow issue: ``` ================================================================= ==11181==ERROR: AddressSanitizer:...
中英文混合切词
hi,请问现在支持中英文混合切词的么?我这里测试是没有正确切开的。 In [4]: for t, f in seg.cut('this is a test sentence. 这个是计算广告的数据啊'): ...: print('%s %s' % (t, f)) ...: this x v i g s g g a g...
``` #1.1.命令格式 C++版 ./thulac [-t2s] [-seg_only] [-deli delimeter] [-user userword.txt] 从命令行输入输出 ./thulac [-t2s] [-seg_only] [-deli delimeter] [-user userword.txt] [-intput inputfile] [-output outputfile] 从文本文件输入输出(注意均为UTF8文本) ``` 发现C++版本README.md及程序提示中的一处错误,输入文件的参数应为“ -input”,而非“intput”。
在使用thulac测试的时候 报错cws_label.txt是什么内容?为何文档里没有说明
类似的,那个THULAC.so编译后,python使用时,也报Segmentation fault (core dumped)