cudf [FEA] Add `bytes_per_second` to all libcudf benchmarks

Many libcudf benchmarks report the bytes_per_second processed as part of the output data. This value is useful for comparing benchmarks because it normalizes the increasing execution time from processing more data. Also, bytes_per_second and real_time_s together let us compute bytes_processed values which serve as a useful Y-axis.

As of the end of 23.12 development, many benchmarks still do not report bytes_per_second in the output data. Here is a figure summarizing the metrics reported by the benchmark suite.

benchmark	status	notes
`APPLY_BOOLEAN_MASK`		see #13937
`BINARYOP`		see #13938
`COPY_IF_ELSE`		see #13960
`GROUPBY`		see #13984
`GROUPBY_NV`
`HASHING`		see #13967, #13965
`JOIN`
`JOIN_NV`
`QUANTILES`
`REDUCTION`
`REPLACE`
`SEARCH`
`SEARCH_NV`
`SET_OPS_NV`
`SHIFT`		see #13950
`SORT`
`SORT_NV`
`STREAM_COMPACTION_NV`		see #14172
`TRANSPOSE`		see #14170

Note: For this tracking list, cuIO benchmarks are omitted because "encoded file size" serves a similar purpose.

Jul 23 '23 21:07 GregoryKimball

Hi, I want to start contributing to cudf and would like to work on this issue.

Should I create one PR per benchmark or put all in one PR?

Aug 22 '23 13:08 Blonck

Should I create one PR per benchmark or put all in one PR?

Thanks for looking into this. Smaller PRs are easier to review so I would prefer one PR per benchmark.

Aug 23 '23 12:08 davidwendt

It is a good idea to have some metrics for data size. I wonder whether it would be even better if we have separate metrics for input and output data. For example, we could add custom metrics like input_items_per_second, output_items_per_second, and input_bytes_per_second and output_bytes_per_second. The first two are the number of items (usually rows), and the last two are the byte sizes of input and output.

Jun 26 '24 21:06 jihoonson

Thank you @jihoonson for your suggestion. I agree that we need better and more granular throughput metrics. I would like to start with an input+output bytes per second metric that is consistently implemented in all benchmarks. From there we can add the separate input/output and bytes/items metrics as you suggest.

Jul 03 '24 16:07 GregoryKimball

@GregoryKimball sounds good 🙂 As my first PR has been merged, I will continue working on the remaining benchmarks.

Jul 03 '24 20:07 jihoonson

I would like to start with an input+output bytes per second metric that is consistently implemented in all benchmarks.

I started thinking that perhaps we should enforce all benchmarks to add such metrics. One way to do it is having new benchmark fixtures, which would explode at the end of the benchmark if those metrics are not set. Then we could update every benchmark to use the new fixtures. @GregoryKimball do you think this is a good idea?

Jul 03 '24 21:07 jihoonson

we should enforce all benchmarks to add such metrics

Thank you @jihoonson for this suggestion. We might go this route at some point, but for now I think converting the rest of the benchmarks from gbench to nvbench and adding the missing metrics is higher priority.

Aug 01 '24 23:08 GregoryKimball

@davidwendt and I discussed that the new alloc_size member function should be a good tool to get quick and sufficiently accurate size estimates.

May 13 '25 18:05 GregoryKimball