BubbleCal

Results 32 comments of BubbleCal

[This](https://www.usenix.org/system/files/osdi23-zhang-qianxi_1.pdf) might help

the current indexing impl do transform (converting raw vectors to PQ codes, SQ is constructed after this) while shuffling. and the refactoring will change this cause: 1. for some kinds...

@westonpace the new indexing follows the steps: 1. train IVF 2. shuffle the dataset into partition files with schema `| ROW_ID | VECTOR | PART_ID |` 3. for each partition,...

> Testing locally I see a bunch of failures from `test_create_ivf_hnsw_with_empty_partition`. I suspect the issue is that the call to `shuffle_dataset` is running more quickly than it did before and...

move to https://github.com/lancedb/lance/pull/3198

@westonpace @wjones127 Just moved the fts to a new `FtsExec` node, and block being with vector search for now

Thanks for feedbacks @zhidongqu-db . `ef` would significantly impact the recall for HNSW, lance would use `ef = 1.5 * k` by default, which is different from faiss. You can...

@456258zaq i don't think there's a way to do this. what's your use case? not changing the cluster centroids would lead to inaccurate results.

I will look into this this week. By the way I'd recommend not to use HNSW_PQ, its perf is much worse than HNSW_SQ, and we are deprecating it. But I...