raft icon indicating copy to clipboard operation
raft copied to clipboard

ANN subsample dataset: use mdspan input

Open tfeher opened this issue 1 year ago • 0 comments

Currently neighbors::detail::utils::subsample takes the dataset input as plain pointer.

The input shall be replaced with an mdspan. This is not done in #2077, because the following question needs to be clarified:

What is the right way to map a pointer for raft mdspan API, if I do not know (and do not care) whether the pointer is on host or device?

One way to do that is to query the pointer attribute and map it accordingly

  cudaPointerAttributes attr;
  RAFT_CUDA_TRY(cudaPointerGetAttributes(&attr, input));
  T* ptr = reinterpret_cast<T*>(attr.devicePointer);
  if (ptr != nullptr) {
    auto dataset = raft::make_device_matrix_view<const T, IdxT>(ptr, n_samples, n_dim);
    my_function(res, dataset);
  } else {
     auto dataset = raft::make_host_matrix_view<const T, IdxT>(input, n_samples, n_dim);
    my_function(res, dataset);
}

But if my_function does only pass the arrays to a third function, then I would need a plain mdspan without any host or device annotation. Shall we work with plain std::experimental::mdspan, or do we want to allow host_device_accessor that has no information about where the data is accessible?

tfeher avatar Jan 23 '24 08:01 tfeher