Ignore last element in exclusive scan
The AgentScan structure doesn't distinguish between inclusive and exclusive variants at the load stage. Therefore, current version of the cub::DeviceScan::ExclusiveScan requires num_items items in the input data. According to the algorithm description, the last item doesn't affect the result.
int *d_in; // e.g., [8, 6, 7, 5, 3, 0, 9]
int *d_out; // e.g., [ , , , , , , ]
cub::DeviceScan::ExclusiveSum(/*...*/);
// d_out <-- [0, 8, 14, 21, 26, 29, 29]
It's not unusual to convert counters into offsets. To use cub::DeviceScan::ExclusiveScan in such tasks, one has to allocate num_items + 1 items for input data. It's not always possible. So CUB users have to assign output[0] = 0 and compute inclusive scan instead. The last approach produces extra step which can be avoided.
The ideal solution would be to produce num_items + 1 items, but this would break existing applications. So instead we could guarantee that the num_items - 1-th item isn't accessed.