gzcf
gzcf
看了 IKAnalyzer 的代码实现,它无法处理汉字字符和非汉字字符混合成词的情况。其内部针对 CJK 字符和英文字母分别实现了子分词器( CJKSegmenter, LetterSegmenter),每种子分词器只接受预设的字符,CJKSegmenter 只处理中日韩字符,LetterSegmenter 只处理英文字母、数字及部分标点符号。导致的结果是,汉字和英文字母一定会被分开,即使词典加了词条也没用。
As I know, ta-lib doesn't support incrementally calculation. Streaming API is just using part of data to calculate. refer [http://www.kbasm.com/blog/ta-lib-not-incremental-and-wrong.html](http://www.kbasm.com/blog/ta-lib-not-incremental-and-wrong.html)
@Unanimad Do you still have plan to create MR to [elasticsearch-dbapi](https://github.com/preset-io/elasticsearch-dbapi) project?
I installed pcre from source and encountered the same problem. These are steps I installed pcre and python-pcre. ``` cd ~/Downloads/pcre-8.35 ./configure --disable-stack-for-recursion --enable-unicode-properties --enable-utf make make install cd ~/Downloads/python-pcre...