Jinsol Park
Jinsol Park
### Description Currently, `getcol()` for `cupyx.scipy.sparse._csr.csr_matrix` does not support -1 indexing. It returns an empty array if -1 is passed as the index. it would be nice to have this...
### Description Adds `build_algo` option to HDBSCAN, and allow HDBSCAN to build knn graphs using nn descent Now user can choose the knn graph build algorithm between "brute_force_knn" and "nn_descent"...
### Description Adds `build_algo=nn_descent` option to UMAP. Now user can choose the knn graph build algorithm between `"brute_force_knn"` and `"nn_descent"` ``` from cuml.manifold.umap import UMAP as cuUMAP from cuml.datasets import...
Using NN Descent to build the knn graph for UMAP and HDBSCAN is working okay, but a few improvements can be made. Related PRs: - HDBSCAN + NN Descent [here](https://github.com/rapidsai/cuml/pull/5939)...
### Description `test_membership_vector_circles` test in `cuml/python/cumlteststest_hdbscan.py` does not pass for different number of `n_points_to_predict`. Specifically, using `n_points_to_predict=90` two issues happen - results are not meeting the threshold 0.9 - getting...
### Description Currently, `GNND::build()` has a `preprocess_data_kernel`, which requires shared memory size of; ``` sizeof(Data_t) * ceildiv(build_config_.dataset_dim, static_cast(raft::warp_size())) * raft::warp_size() ``` Considering `sizeof(Data_t) = 4` for standard cases such as...
### Description NN Descent shows low recall (compared to using brute force knn) for large datasets. This makes it difficult to scale up and out and use NN Descent for...
I hope NN Descent can support calculating distances in `local_join_kernel` using fp32 instead of fp16 (`__half`). Due to precision issues, it is difficult to grab the distances calculated by NN...
**Describe the bug** For small datasets (`n_rows=100`) NN Descent breaks out of the iteration loop early (after 1 iteration), and does not properly return update indices. Specifically, this behavior happens...
PR for future references of putting data on host for HDBSCAN so that it scales to large datasets. **No reviews needed.** In `reachability.cuh`, currently using a `optimize()` function from cuvs...