cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[Minor] Extend the Parquet writer's dictionary encoding benchmark.

Open mhaseeb123 opened this issue 1 year ago • 4 comments

Description

This PR extends the data cardinality and run length range for the existing parquet writer's encoding benchmark.

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [x] The documentation is up to date with these changes.

mhaseeb123 avatar Aug 17 '24 01:08 mhaseeb123

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Aug 19 '24 21:08 copy-pr-bot[bot]

/ok to test

mhaseeb123 avatar Aug 19 '24 21:08 mhaseeb123

what's the reason for this change?

vuule avatar Aug 20 '24 11:08 vuule

what's the reason for this change?

First of all welcome back. Greg wanted me to push any updates I did to the benchmark for #16541. Though I think that all my local changes (even wider extended ranges) need not to be pushed upstream if not needed.

mhaseeb123 avatar Aug 20 '24 17:08 mhaseeb123

/ok to test

mhaseeb123 avatar Sep 06 '24 23:09 mhaseeb123

/ok to test

mhaseeb123 avatar Sep 07 '24 01:09 mhaseeb123

/ok to test

mhaseeb123 avatar Sep 09 '24 18:09 mhaseeb123

/merge

mhaseeb123 avatar Sep 09 '24 21:09 mhaseeb123

Would be nice to know how much time this increases in benchmark runs. If it is not available now, follow up with Randy on benchmark runs.

Results in #16541 (here) for which we are extending this. Each new benchmark in the matrix takes roughly 0.5s to run on my workstation (AMD Threadripper + RTX Ada 5880) so it should be roughly an 4s increase in total time (8x new benchmarks).

mhaseeb123 avatar Sep 09 '24 21:09 mhaseeb123