cub icon indicating copy to clipboard operation
cub copied to clipboard

Merge sort key type selection

Open gevtushenko opened this issue 3 years ago • 2 comments

Before porting to CUB, Thrust implementation of merge sort didn't use to have *copy version. When introducing Copy overload, I followed the CUB generic scheme of selecting output iterator value type. After that, we changed type selection for most algorithms so it no longer follows these scheme. Merge sort implementation, however, wasn't changed. This leads to compilation failures in some cases. @harrism provided the following reproducer:

#include <thrust/device_vector.h>
#include <thrust/sequence.h>

#include <thrust/iterator/transform_output_iterator.h>

#include <cub/device/device_merge_sort.cuh>

template <typename Tuple>
struct trajectory_comparator {
  __device__ bool operator()(Tuple const& lhs, Tuple const& rhs)
  {
    auto lhs_id = thrust::get<0>(lhs);
    auto rhs_id = thrust::get<0>(rhs);
    auto lhs_ts = thrust::get<1>(lhs);
    auto rhs_ts = thrust::get<1>(rhs);
    return (lhs_id < rhs_id) || ((lhs_id == rhs_id) && (lhs_ts < rhs_ts));
  };
};


struct foo {
    float x;
    float y;
};

struct bar {
    float x;
    float y;
};

struct foo_to_bar {
    bar operator()(foo const& f) { return bar{f.x, f.y}; }
};

struct double_foo {
    foo operator()(foo const& f) { return foo{f.x * 2.0f, f.y * 2.0f}; }
};


int main(void)
{
  thrust::device_vector<int> keys(1000);
  thrust::sequence(keys.begin(), keys.end());

  thrust::device_vector<foo> values(1000);

  thrust::device_vector<int> keys_out(1000, 0);
  
  #if 0 // transform iterator input and output types the same
  thrust::device_vector<foo> foo_values_out(1000);
  auto values_transformed_out = thrust::make_transform_output_iterator(foo_values_out.begin(), double_foo{});
  #else // transform iterator input and output types different
  thrust::device_vector<bar> bar_values_out(1000);
  auto values_transformed_out = thrust::make_transform_output_iterator(bar_values_out.begin(), foo_to_bar{});
  #endif

  std::size_t temp_storage_bytes = 0;
  cub::DeviceMergeSort::SortPairsCopy(nullptr,
                                      temp_storage_bytes,
                                      keys.begin(),
                                      values.begin(),
                                      keys_out.begin(),
                                      values_transformed_out,
                                      1000,
                                      thrust::less<int>{});

  void* temp_storage = nullptr;
  cudaMalloc(&temp_storage, temp_storage_bytes);

  cub::DeviceMergeSort::SortPairsCopy(temp_storage,
                                      temp_storage_bytes,
                                      keys.begin(),
                                      values.begin(),
                                      keys_out.begin(),
                                      values_transformed_out,
                                      1000,
                                      thrust::less<int>{});

  return 0;
}

The issue is related to the fact that we use output iterator value type to instantiate block load facility and then provide input iterator values that are not accepted by the block load member functions. We have to investigate if changing key/value type selection to be based on the input iterators is possible.

gevtushenko avatar Sep 13 '22 06:09 gevtushenko

Note that trajectory_comparator in that code is unused -- I copied it in by mistake.

harrism avatar Sep 13 '22 06:09 harrism

BTW, in this example it's not the key type selection that fails, it's the value type.

harrism avatar Sep 13 '22 06:09 harrism