rocPRIM icon indicating copy to clipboard operation
rocPRIM copied to clipboard

[Question]: WarpSortConfig::PartitioningThreshold=3000, how this magic number choosen ?

Open ZJLi2013 opened this issue 1 year ago • 1 comments

hi, rocm expert,

wonder how 3000 this magic number is considered here ?

when bench matrix shape as [m, n], if m <3000, then segmented_radix_sort_impl() will never go to do_paritioning, looks inside which has more fine-grained kernel depending on different segment_counts.

on the other hand, CUDA::CUB can do segmented per row, are we expecting some perf gap here?

Thanks for guiding

ZJLi2013 avatar Sep 12 '24 09:09 ZJLi2013

Hi, these values are determined by our autotuning system. We invoke this on a set of GPUs, which then compiles & benchmarks the algorithms for a range of parameters. A developer-oriented explanation is given here.

If you believe that there is a performance issue there, you have a few options:

  • You can pass a custom config for that particular operation where you manually set the values.
  • You can also add a benchmark case in the benchmark for segmented radix sort for your dimensions, and run the tuning yourself.

Snektron avatar Sep 12 '24 12:09 Snektron

Hi @ZJLi2013. Has your issue been resolved? If so, please close the ticket. Thanks!

ppanchad-amd avatar Nov 25 '24 19:11 ppanchad-amd

Hi @ZJLi2013 Closing ticket. Please feel free to re-open if you still need assistance. Thanks!

ppanchad-amd avatar Apr 15 '25 14:04 ppanchad-amd