BubbleCal
BubbleCal
> Thank you! After executing createIndex, how does the performance of full-text search compare to using sparse vector search in some vector databases? I'd expect there're no much differences, the...
We are migrating the FTS from tantivy to lance implementation. For the new implementation, it supports to specify the column to search, but for now, it doesn't support to index/search...
I think we support Chinese in lance, but not lancedb for now. I will take a look how to integrate it into lancedb
This feature is on the roadmap but we don't get enough people to be working on this. For now we can query over Chinese/Japanese text by [ngram tokenizer](https://lancedb.com/docs/search/full-text-search/#search-for-substring)
lance requires there must be enough data to create a vector index, we may remove the limitation in the future. related to https://github.com/lancedb/lance/issues/3940
> Something seems off in the algorithm, with how `missed_similarities` is handled. Could you address my comment, and also maybe write a unit tests that shows we get correct results?...
Now we have implemented new tokenization, see details [here](https://lancedb.github.io/lancedb/fts/)
It looks like `Cargo.lock` needs to be updated, go into `python` directory and run `cargo check` would update it @lyang24
the failed cases are not related to this PR, merge it. thanks for the contribution! @lyang24