cccl icon indicating copy to clipboard operation
cccl copied to clipboard

[FEA]: DeviceReduce argument version with user-defined reduction

Open yjian012 opened this issue 10 months ago • 6 comments

Is this a duplicate?

  • [X] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

CUB

Is your feature request related to a problem? Please describe.

I encountered this problem: Given an unsigned integer array, find if there's any "1" bits, and if so, find the first location. I can use DeviceReduce::Max to check if there's any "1"s, but I'm not sure how to do the other. One way is to create a new array and copy the values as 0 if it's 0 and 1 if it's non-zero, then do an ArgMax to find the index, then copy that element in the original array to the host and find the bit, but it's kind of awkward. Or just copy the entire array to host and do it on CPU.

Describe the solution you'd like

If there is an argument reduction with user-defined reduction, this would be much easier.

Describe alternatives you've considered

No response

Additional context

No response

yjian012 avatar Apr 10 '24 01:04 yjian012

It sounds like what you're looking for is cub::DeviceReduce::ArgMax?

https://nvidia.github.io/cccl/cub/api/structcub_1_1DeviceReduce.html#_CPPv4I00EN3cub12DeviceReduce6ArgMaxE11cudaError_tPvR6size_t14InputIteratorT15OutputIteratorTi12cudaStream_t

jrhemstad avatar Apr 10 '24 15:04 jrhemstad

It sounds like what you're looking for is cub::DeviceReduce::ArgMax?

https://nvidia.github.io/cccl/cub/api/structcub_1_1DeviceReduce.html#_CPPv4I00EN3cub12DeviceReduce6ArgMaxE11cudaError_tPvR6size_t14InputIteratorT15OutputIteratorTi12cudaStream_t

Not ArgMax. E.g. If the array uint8_t numbers[4] is something like '00000000 00000000 00010001 11011000', I want to find where the first time a '1' appears, i.e. numbers[3], 4th bit from left. But ArgMax will find the last number. So, it should treat any non-zero value as equal to each other, which can be done in a custom function. cub::DeviceReduce::Reduce takes custom functions, but there's no Argument equivalent.

yjian012 avatar Apr 11 '24 04:04 yjian012

I want to find where the first time a '1' appears, i.e. numbers[3], 4th bit from left.

I assume you mean numbers[2]? And what exactly are you looking for as the output? The index 2 indicating the position of the first element with a set bit? Or do you want the index to the position of the first set bit?

jrhemstad avatar Apr 11 '24 14:04 jrhemstad

I want to find where the first time a '1' appears, i.e. numbers[3], 4th bit from left.

I assume you mean numbers[2]? And what exactly are you looking for as the output? The index 2 indicating the position of the first element with a set bit? Or do you want the index to the position of the first set bit?

Oh, yes, numbers[2], sorry. It should return the key-value pair, in this case {2, 0b00010001}. The location of the first appearance of '1' can then be found as 2*8+3, because there are 2 byte shifts (from the index) and 3 bit shifts (from the value). So it's like a ArgMin/ArgMax, but with a custom reduction.

yjian012 avatar Apr 11 '24 21:04 yjian012

You are describing thrust::find_if . https://nvidia.github.io/cccl/thrust/api/groups/group__searching.html#function-find-if

fkallen avatar Apr 12 '24 04:04 fkallen

You are describing thrust::find_if . https://nvidia.github.io/cccl/thrust/api/groups/group__searching.html#function-find-if

That's right! I don't know about thrust::device but it seems that passing a C array as a device vector works. I'll take a look at Thrust library. Thanks!

yjian012 avatar Apr 12 '24 04:04 yjian012