cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[FEA] Update chunked parquet reader benchmarks to include `pass_read_limit`

Open GregoryKimball opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe. The BM_parquet_read_chunks benchmark in benchmarks/io/parquet/parquet_reader_input.cpp includes a byte_limit nvbench axis. This axis controls the chunk_read_limit. With the new features added in #14360, there is a new chunked_parquet_reader API that exposes both chunk_read_limit and pass_read_limit parameters to control reader behavior. We currently do not have a method for benchmarking pass_read_limit values.

Describe the solution you'd like

  • [ ] Add a new benchmark, such as BM_parquet_read_subrowgroup_chunks, that provides nvbench axes for both chunk_read_limit and pass_read_limit
  • [ ] Rename byte_limit to chunk_read_limit in BM_parquet_read_chunks for clarity, now that we have both input and output byte limits in chunked parquet reading.
  • [ ] Also, please consider adding an nvbench axis for data_size for at least the chunked parquet reader benchmarks. It would be useful to allow the benchmarks to operate on tables larger than 536 MB.

Describe alternatives you've considered n/a

GregoryKimball avatar Feb 14 '24 21:02 GregoryKimball