Tamas Bela Feher
Tamas Bela Feher
PR #2157 enables vector addition to a CAGRA index. Some issues that can come up with large datasets / graph: - graph in host memory not supported by extend https://github.com/rapidsai/raft/pull/2157/files#r1561657193...
**Describe the bug** Similar to https://github.com/rapidsai/raft/pull/2183, the mean and stdev kernels have also potential out of bounds access https://github.com/rapidsai/raft/blob/branch-24.04/cpp/include/raft/stats/detail/stddev.cuh#L48 https://github.com/rapidsai/raft/blob/branch-24.04/cpp/include/raft/stats/detail/mean.cuh#L46 Additionally the [minmax](https://github.com/rapidsai/raft/blob/67893676f3d9b90e572f78b969172f840115b22f/cpp/include/raft/stats/detail/minmax.cuh#L159minmax) kernel should be also checked, whether it...
**Describe the bug** The [sum kernel](https://github.com/rapidsai/raft/blob/67893676f3d9b90e572f78b969172f840115b22f/cpp/include/raft/stats/detail/sum.cuh#L65) does not handle underflows correctly, and that leads to inaccurate results. **Steps/Code to reproduce bug** As reported by @lijinf2: > We also did an...
**Is your feature request related to a problem? Please describe.** During IVF-Flat search a query vector is compared to all the vectors from `n_probes` clusters, and we have `n_queries *...
**Describe the bug** `knn_merge_parts` is only implemented for k1024: https://github.com/rapidsai/raft/blob/eb6fdef68d19357e9f44494653ecd4206340ff6b/cpp/include/raft/neighbors/detail/knn_merge_parts.cuh#L149-L171 `knn_merge_parts `is used during brute force search if: - an offset index needs to be added to the indices. This...
Currently CAGRA+HNSW benchmarks with raft_ann_bench require GPU to run. While GPU is essential for building the index with CAGRA, it would be useful to be able to compile and run...
Currently [neighbors::detail::utils::subsample](https://github.com/rapidsai/raft/pull/2077/files#diff-f4662666209658cc0fc710aae66eb045de253eff2c46339a36daf87e29eaf6e8R612) takes the dataset `input` as plain pointer. The `input` shall be replaced with an mdspan. This is not done in #2077, because the following question needs to be...
**Describe the bug** When `raft::copy` is used to copy data between two mdspans, the execution time is very slow. **Steps/Code to reproduce bug** Compare the execution time of these loops:...
**Is your feature request related to a problem? Please describe.** For IVF-Flat ad IVF-PQ index building, large datasets are provided in host memory or as `mmap`-ed file. After the cluster...
In IVF-Flat and IVF-PQ, we generate random indices and shuffle or subsample the dataset using these indices before training. Currently a fixed seed is used to generate random indices. This...