cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[FEA] Expand ORC and Parquet benchmarks to cover different stripe/rowgroup sizes

Open vuule opened this issue 3 years ago • 5 comments

Add a set of benchmarks with varying stripe/rowgroup sizes to each affected component:

  • [ ] ORC reader
  • [ ] ORC writer
  • [ ] Parquet reader
  • [ ] Parquet writer

Use the new benchmarks to evaluate the effects of these options and potentially determine the optimal settings.

vuule avatar Jan 21 '22 00:01 vuule

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Feb 20 '22 00:02 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] avatar May 21 '22 00:05 github-actions[bot]

Note: any PRs that change benchmarks are encouraged/required to migrate the benchmarks to NVBench.

vuule avatar Aug 02 '22 18:08 vuule

Perhaps instead of adding a new benchmark, we should just add a new nvbench axis for rowgroup/stripe size. Then we would be able to do targeted studies without increasing the benchmark complexity or runtime.

GregoryKimball avatar Jul 05 '23 16:07 GregoryKimball

Do you mean that we should add an axis that only has the default value?

vuule avatar Jul 05 '23 19:07 vuule