THULAC issues

区域，时间等这些模型数据是如何训练出来的，可以修改吗？

2

neg.dat ns.dat singlepun.dat t2s.dat time.dat xu.dat

alexlee728

内存使用太大、模型需要压缩

5

内存使用太大 model文件没有做压缩，太大内存使用太大：一下子申请很大内存。可以考虑对model文件压缩（我对cws_dat压缩后只有18M，原来大约60M），然后边使用边解压。这样不model文件和内存都可以降低很多。

alexlee728

英文分词时候，标点符号分割错误

hello word. ---> hello_x word._x

alexlee728

When I give an empty file to the program train_c, I found this issue: ``` ================================================================= ==19071==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x7f51e6a9b800 #0 0x7f51e5ac4b2a in...

fCorleone

SEGV signal occurred when running program thulac

1

When I try to run thulac and thulac_test program. I found this : ``` ASAN:SIGSEGV ================================================================= ==12976==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fbf3a4841ba bp 0x000000000000 sp 0x7ffc53739440 T0)...

fCorleone

Buffer overflow occurred during training process

When I try to run program train_c with the command line : ``` ./train_c train_file outfile ``` The address sanitizer found a heap buffer overflow issue: ``` ================================================================= ==11181==ERROR: AddressSanitizer:...

fCorleone

中英文混合切词

2

hi，请问现在支持中英文混合切词的么？我这里测试是没有正确切开的。 In [4]: for t, f in seg.cut('this is a test sentence. 这个是计算广告的数据啊'): ...: print('%s %s' % (t, f)) ...: this x v i g s g g a g...

linhx13

文档及模型参数提示信息的一处错误

``` #1.1.命令格式 C++版 ./thulac [-t2s] [-seg_only] [-deli delimeter] [-user userword.txt] 从命令行输入输出 ./thulac [-t2s] [-seg_only] [-deli delimeter] [-user userword.txt] [-intput inputfile] [-output outputfile] 从文本文件输入输出（注意均为UTF8文本） ``` 发现C++版本README.md及程序提示中的一处错误，输入文件的参数应为“ -input”，而非“intput”。

leyiwang