cccl icon indicating copy to clipboard operation
cccl copied to clipboard

Give inclusive_scan an overload with init

Open jaredhoberock opened this issue 12 years ago • 6 comments

It's difficult to implement scan and other sequences of scan or sums otherwise. The init parameter can accept the "carry" of the previous sum. inclusive_scan without init can easily be implemented by the more general overload by passing the initial input element as the init and having the first thread copy to the first element of the output.

### Tasks
- [x] Add cub::BlockScan::InclusiveScan array based APIs
- [x] Add DeviceScan::InclusiveScan APIs
- [ ] https://github.com/NVIDIA/cccl/issues/2146
- [ ] Add cub::BlockScan::InclusiveScan value based APIs

jaredhoberock avatar Jun 07 '13 22:06 jaredhoberock

This will subtly change the semantics of inclusive_scan without init, but we should do it anyway to match N3724.

Currently in Thrust, if the intermediate_type differs from the input & output types, then the first element of the result will go through a conversion:

intermediate_type sum = *first;

*result = sum;

for(++first, ++result; first != last; ++first, ++result)
  *result = sum = binary_op(sum,*first);

After giving inclusive_scan an init parameter, the semantics of inclusive_scan would be

inclusive_scan(..., T init, ...)
{
  *result = init;

  for(++first, ++result; first != last; ++first, ++result)
    *result = init = binary_op(init,*first);
}

In other words, the intermediate_type is simply the type of init, as in exclusive_scan.

jaredhoberock avatar Oct 03 '13 23:10 jaredhoberock

Better wait and see what syntax shakes out of Parallelism TS

jaredhoberock avatar Jun 24 '14 00:06 jaredhoberock

To fully implement stdpar, nvc++ needs the overloads of inclusive_scan and transform_inclusive_scan that take an initial value.

dkolsen-pgi avatar Mar 22 '21 17:03 dkolsen-pgi

We received our first bug report from an end user about this. The lack of this function in Thrust means one of the overloads of std::inclusive_scan cannot be parallelized by nvc++ -stdpar.

dkolsen-pgi avatar Jul 05 '23 17:07 dkolsen-pgi

Any updates or ETA on this?

brycelelbach avatar May 01 '24 16:05 brycelelbach

@brycelelbach incoming

gonidelis avatar May 01 '24 16:05 gonidelis