BubbleCal comments

Results 28 comments of


                                            BubbleCal

feat: support delta merge for IVF_HNSW_SQ

this is not efficient because it just skipped the IVF & quantization training, the graph will be re-constructed. I will leave the optimization (reuse the existing graph) to the next...

perf: do not run take if refine is not specified

Stage 5 of create plan also adds a take node

Full text search (FTS) indices

## What is this for With the capability of full text search, we can retrieve the document data more efficient, and with BM25 we can rank the results to reach...

Full text search (FTS) indices

To get it work as soon as possible, I haven't integrated it into the filter expression, instead, just added a new interface to execute the full text search, may remove...

feat: replicate boundary vectors to multiple partitions

> just one concern: can we define `compute_partitions` in terms of `compute_multiple_partitions` instead of duplicating the code? > > same for `compute_multiple_memberships` sure we can, just duplicated the code for...

bug(python): Embeddings with lots of zeros cause numeric instability when indexing.

hi @koaning , i tried to reproduce your first panic problem by creating index with vectors with lots of zeros. i got the warning logs as the same as yours,...

bug(python): Embeddings with lots of zeros cause numeric instability when indexing.

hi @koaning TLDR: you can resolve the issue by creating index with params below: - `num_partitions` should be the `num_rows / 1,000,000` or `sqrt(num_rows)`, but at least 1. the default...

bug(python): Embeddings with lots of zeros cause numeric instability when indexing.

for the warning logs: the PQ training also divides data into partitions, the number of partitions(centroids) is `pow(2, num_bits)`, by default, the `num_bits=8` so it's 256 centroids. try to create...

bug(python): Embeddings with lots of zeros cause numeric instability when indexing.

oh you are right... the lancedb doesn't expose this param. setting `num_sub_vector=1` should work if you have enough rows.

chore: fix passing wrong prefetch hint

> Please fix it in both files before landing (InMemoryVectorStorage as well) that has been fixed by other PR, this is the only one left