rocPRIM
rocPRIM copied to clipboard
[Question]: WarpSortConfig::PartitioningThreshold=3000, how this magic number choosen ?
hi, rocm expert,
wonder how 3000 this magic number is considered here ?
when bench matrix shape as [m, n], if m <3000, then segmented_radix_sort_impl() will never go to do_paritioning, looks inside which has more fine-grained kernel depending on different segment_counts.
on the other hand, CUDA::CUB can do segmented per row, are we expecting some perf gap here?
Thanks for guiding
Hi, these values are determined by our autotuning system. We invoke this on a set of GPUs, which then compiles & benchmarks the algorithms for a range of parameters. A developer-oriented explanation is given here.
If you believe that there is a performance issue there, you have a few options:
- You can pass a custom config for that particular operation where you manually set the values.
- You can also add a benchmark case in the benchmark for segmented radix sort for your dimensions, and run the tuning yourself.
Hi @ZJLi2013. Has your issue been resolved? If so, please close the ticket. Thanks!
Hi @ZJLi2013 Closing ticket. Please feel free to re-open if you still need assistance. Thanks!