cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[FEA] Add `bytes_per_second` to all libcudf benchmarks

Open GregoryKimball opened this issue 2 years ago • 3 comments

Many libcudf benchmarks report the bytes_per_second processed as part of the output data. This value is useful for comparing benchmarks because it normalizes the increasing execution time from processing more data. Also, bytes_per_second and real_time_s together let us compute bytes_processed values which serve as a useful Y-axis.

As of the end of 23.12 development, many benchmarks still do not report bytes_per_second in the output data. Here is a figure summarizing the metrics reported by the benchmark suite.

image

benchmark status notes
APPLY_BOOLEAN_MASK see #13937
BINARYOP see #13938
COPY_IF_ELSE see #13960
GROUPBY see #13984
GROUPBY_NV
HASHING see #13967, #13965
JOIN
JOIN_NV
QUANTILES
REDUCTION
REPLACE
SEARCH
SEARCH_NV
SET_OPS_NV
SHIFT see #13950
SORT
SORT_NV
STREAM_COMPACTION_NV see #14172
TRANSPOSE see #14170

Note: For this tracking list, cuIO benchmarks are omitted because "encoded file size" serves a similar purpose.

GregoryKimball avatar Jul 23 '23 21:07 GregoryKimball

Hi, I want to start contributing to cudf and would like to work on this issue.

Should I create one PR per benchmark or put all in one PR?

Blonck avatar Aug 22 '23 13:08 Blonck

Should I create one PR per benchmark or put all in one PR?

Thanks for looking into this. Smaller PRs are easier to review so I would prefer one PR per benchmark.

davidwendt avatar Aug 23 '23 12:08 davidwendt

It is a good idea to have some metrics for data size. I wonder whether it would be even better if we have separate metrics for input and output data. For example, we could add custom metrics like input_items_per_second, output_items_per_second, and input_bytes_per_second and output_bytes_per_second. The first two are the number of items (usually rows), and the last two are the byte sizes of input and output.

jihoonson avatar Jun 26 '24 21:06 jihoonson

Thank you @jihoonson for your suggestion. I agree that we need better and more granular throughput metrics. I would like to start with an input+output bytes per second metric that is consistently implemented in all benchmarks. From there we can add the separate input/output and bytes/items metrics as you suggest.

GregoryKimball avatar Jul 03 '24 16:07 GregoryKimball

@GregoryKimball sounds good 🙂 As my first PR has been merged, I will continue working on the remaining benchmarks.

jihoonson avatar Jul 03 '24 20:07 jihoonson

I would like to start with an input+output bytes per second metric that is consistently implemented in all benchmarks.

I started thinking that perhaps we should enforce all benchmarks to add such metrics. One way to do it is having new benchmark fixtures, which would explode at the end of the benchmark if those metrics are not set. Then we could update every benchmark to use the new fixtures. @GregoryKimball do you think this is a good idea?

jihoonson avatar Jul 03 '24 21:07 jihoonson

we should enforce all benchmarks to add such metrics

Thank you @jihoonson for this suggestion. We might go this route at some point, but for now I think converting the rest of the benchmarks from gbench to nvbench and adding the missing metrics is higher priority.

GregoryKimball avatar Aug 01 '24 23:08 GregoryKimball

@davidwendt and I discussed that the new alloc_size member function should be a good tool to get quick and sufficiently accurate size estimates.

GregoryKimball avatar May 13 '25 18:05 GregoryKimball