cub icon indicating copy to clipboard operation
cub copied to clipboard

Allow custom tuning policies to be passed into device algorithms.

Open sh1ng opened this issue 7 years ago • 1 comments

When I used an iterator as an input for device-reduce reducing kernel was limited by amount of registers. The iterator does a few math operation on data in global memory plus branching. Degreasing default parameters in dispatch_reduce.cuh resulted in slight performance improvement, but I saw that it affected performance of simple reduction. I kept those changes because it improves total performance. Do you think it wise to add an optional parameter to specify execution policy for every device operation? What tricks can also be used to improve performance for a pipeline like read data from global memory -> deterministic logic on it -> cub operation like reduce?

sh1ng avatar Jan 05 '19 11:01 sh1ng

I agree that it would be very useful to provide custom tuning policies when invoking device algorithms. I think it might already be possible to inject one somewhere in the Device -> Dispatch -> Agent stack, but ideally we should provide a more user-friendly API that we can document and test.

alliepiper avatar Oct 21 '20 17:10 alliepiper