the peformance of radix sort is lower than cub or hipPRIM

Open xqch1983 opened this issue 6 years ago • 2 comments

if we have do some work to improve the performance of radix_sort_by_key( ), as i tested , the perf is 11ms per 1m element size. while ~1.15ms in rocmPRIM(OpenCL) and CUB(cuda) per 1M elements

Jan 17 '19 02:01 xqch1983

Yeah, the performance of this algorithm was not improved for some time. We should check how rocPRIM does it and try similar things.

Jan 17 '19 07:01 jszuppe

btw. rocPRIM is not implemented in OpenCL. It's HIP and HC (AMD's C++AMP).

Jan 17 '19 09:01 jszuppe