Wang Fenjin comments

Results 147 comments of


                                            Wang Fenjin

WCDB 数据库是否支持使用该分词器呢？

wcdb 自带的易用性上会，没有这里的一些函数如果要集成进去可能没那么容易，没有研究过他们，感觉 wcdb 把 sqlite 的接口包了一层

按照单个字母，如a或者b搜索时，搜出了不相关的字

U+6392: pái,pǎi,bài https://github.com/wangfenjin/simple/blob/master/contrib/pinyin.txt 原因是排的读音有多个；最近确实有一些人反馈现在用的这个拼音库太全了，很多生僻音也在。可能要换个简化版

> U+6392: pái,pǎi,bài > > https://github.com/wangfenjin/simple/blob/master/contrib/pinyin.txt > > 原因是排的读音有多个；最近确实有一些人反馈现在用的这个拼音库太全了，很多生僻音也在。可能要换个简化版可以用 jieba_query() 如果彻底关闭拼音说实话就没必要用这个库。可能做的是把拼音文件换成 https://github.com/mozillazg/pinyin-data/blob/master/kHanyuPinyin.txt

mac m1 模拟器运行时报错

``` ~ sqlite3 SQLite version 3.46.0 2024-05-23 13:25:27 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> .load...

可否设定对数字不做分词？

你说的这个问题跟分词器没关系，对于给定字段要不要索引是建表的时候定的，分词器只是把你需要分词的东西做分词。 https://www.wangfenjin.com/posts/simple-jieba-tokenizer/ 你可以看看这个文章，讲了怎么组织数据结构

是否可以打个patch支持分词器的trigrams功能？

欢迎pr

是否可以打个patch支持分词器的trigrams功能？

https://github.com/leiless/sqlite3-ngram @leiless 有想法把你这个库的逻辑合并到这里来吗？不过我还么想清楚这个功能和结巴分词比有什么优势，以及怎么和 pinyin 功能结合

是否可以打个patch支持分词器的trigrams功能？

嗯，我感觉确实合起来功能上不搭

Upgrade to arrow 57

arrow crate is really annoying, they keep bumping version, hope they can have some kind of stable release cycle...