Rohan Chitale comments

Results 11 comments of


                                            Rohan Chitale

[BUG] cuVS Cagra Python API has low recall for inner product datasets

@lowener I've updated the issue with the recall I get when I use FAISS. Strangely, the recall for `coherev2-dbpedia` is higher for cuVS compared to FAISS. I also consistently see...

[BUG] cuVS Cagra Python API has low recall for inner product datasets

Thanks @lowener. For reference, I used `cuvs` version 24.12 and I built `faiss` off of this commit: https://github.com/facebookresearch/faiss/commit/df6a8f6b4e6ed4c509e52d1e015f87fd742c17df

[BUG] cuVS Cagra Python API has low recall for inner product datasets

Interestingly, I found that using `ivf_pq_build_params` instead of `ivf_pq_params` here: https://github.com/navneet1v/VectorSearchForge/blob/main/cuvs_benchmarks/main.py#L375 resulted in higher recall and faster index build times for the `marco-tasb` and `cohere-768-ip` datasets. I saw no difference...

[BUG] cuVS Cagra Python API has low recall for inner product datasets

Thank you @lowener! Will this code change also improve the recall we see with faiss for `cohere-768-ip`? The 82.6% recall we see with faiss is still lower than the recall...

[BUG] cuVS Cagra Python API has low recall for inner product datasets

Hi @lowener, I built faiss from source using cuvs 25.06. I verified the recall is now 91.9% for the cohere-768 IP dataset using faiss-cuvs - this is a big improvement...

[BUG] cuVS Cagra Python API has low recall for inner product datasets

[QST] How to predict the memory required to build an index with CagraIndex using IVFPQ

@tfeher Here is the results of the `rmm` logs, with using `mmap` to load the vectors: [rmm_log_mmap.csv](https://github.com/user-attachments/files/19915410/rmm_log_mmap.csv) Here is the results of the `rmm` logs, without using `mmap` to load...

[QST] How to predict the memory required to build an index with CagraIndex using IVFPQ

@tfeher Graph with using mmap: ![Image](https://github.com/user-attachments/assets/575e6b49-d40d-4983-a431-7a168996aa24) Graph without using mmap: ![Image](https://github.com/user-attachments/assets/6e56ddf5-6704-4873-abe9-193aa9c633ec) The memory usage stats don't match what I see when i use `nvidia-smi`; these graphs look exactly the same....

[RFC] Remote Vector Index Builder -- Worker Dataplane Design

Please note that we are open for feedback on this, as this feature is still a work in progress.

[RFC] Remote Vector Index Builder -- Worker Dataplane Design

To add on - the worker calculates the resource requirements for the job up front, before deciding whether to accept or reject it. We use a pretty naive approach, that's...