cudf
cudf copied to clipboard
[FEA] Update chunked parquet reader benchmarks to include `pass_read_limit`
Is your feature request related to a problem? Please describe.
The BM_parquet_read_chunks benchmark in benchmarks/io/parquet/parquet_reader_input.cpp includes a byte_limit nvbench axis. This axis controls the chunk_read_limit. With the new features added in #14360, there is a new chunked_parquet_reader API that exposes both chunk_read_limit and pass_read_limit parameters to control reader behavior. We currently do not have a method for benchmarking pass_read_limit values.
Describe the solution you'd like
- [ ] Add a new benchmark, such as
BM_parquet_read_subrowgroup_chunks, that provides nvbench axes for bothchunk_read_limitandpass_read_limit - [ ] Rename
byte_limittochunk_read_limitinBM_parquet_read_chunksfor clarity, now that we have both input and output byte limits in chunked parquet reading. - [ ] Also, please consider adding an nvbench axis for
data_sizefor at least the chunked parquet reader benchmarks. It would be useful to allow the benchmarks to operate on tables larger than 536 MB.
Describe alternatives you've considered n/a