cudf [FEA] Expand ORC and Parquet benchmarks to cover different stripe/rowgroup sizes

[FEA] Expand ORC and Parquet benchmarks to cover different stripe/rowgroup sizes

Open vuule opened this issue 3 years ago • 5 comments

Add a set of benchmarks with varying stripe/rowgroup sizes to each affected component:

[ ] ORC reader
[ ] ORC writer
[ ] Parquet reader
[ ] Parquet writer

Use the new benchmarks to evaluate the effects of these options and potentially determine the optimal settings.

Jan 21 '22 00:01 vuule

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Feb 20 '22 00:02 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

May 21 '22 00:05 github-actions[bot]

Note: any PRs that change benchmarks are encouraged/required to migrate the benchmarks to NVBench.

Aug 02 '22 18:08 vuule

Perhaps instead of adding a new benchmark, we should just add a new nvbench axis for rowgroup/stripe size. Then we would be able to do targeted studies without increasing the benchmark complexity or runtime.

Jul 05 '23 16:07 GregoryKimball

Do you mean that we should add an axis that only has the default value?

Jul 05 '23 19:07 vuule

cudf cudf copied to clipboard

[FEA] Expand ORC and Parquet benchmarks to cover different stripe/rowgroup sizes

cudf
cudf copied to clipboard