cudf
cudf copied to clipboard
Add `bytes_per_second` to groupby max benchmark.
This patch adds memory statistics for the GROUPBY_NVBENCH benchmark using the max aggregation.
For this purpose helper functions are introduced to compute the payload size for:
- Column
- Table
- Groupby execution results
This patch relates to #13735.
Checklist
- [x] I am familiar with the Contributing Guidelines.
Pull requests from external contributors require approval from a rapidsai organization member with write permissions or greater before CI can begin.
Hi, I added a few helper function which I would use also for the other groupby benchmarks.
PS: Should I add the other groupby benchmarks to this PR to reduce the amount of PR for #13735 a bit?
Would other benchmarks benefit from those helper functions as well? If not, we can move them to benchmarks /groupby/group_common.hpp.
Should I add the other groupby benchmarks to this PR to reduce the amount of PR for #13735 a bit?
Sounds reasonable to me.
Would other benchmarks benefit from those helper functions as well? If not, we can move them to `benchmarks
I think so. They could be used in a few of the benchmarks to simplify the code.
/ok to test
Hi, the PR is not ready, I had not enough time the last two days. So no need to review it :smiley:.
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
Hi @PointKernel , @davidwendt,
in the current state of the PR every NVBENCH benchmark has bytes_per_second added. However, I'm a bit unhappy with its current state. Specifically, I'd like to introduce test code to validate the calculations for required_bytes, especially when dealing with nested column types.
I'm considering adding these utility functions in the test/utilities directoriy and creating corresponding tests there. I've noticed that similar utility functions used in the tests are also employed within the benchmarks. However, I'm think that housing benchmark-specific helper functions under the test directories is not optimal. Do you have any better ideas?
PS: Does something like a nested column iterator which iterates over all columns, including child columns, of a table. I've not found anything like this, I'm also unsure if it feasible to implement, but would be great for required_bytes :sweat_smile:.
PPS: benchmark output perf_log.txt
Hi @Blonck, this is nice work and I'd like to get this change merged. However, it seems that you haven't had a chance to finish up this. Would you mind if I take this from where you left off and finish up?
You may want to look into using cudf::row_bit_count to get the column/table sizes: https://docs.rapids.ai/api/libcudf/stable/group__transformation__transform.html#gaa78354f8eda093519182149710215d6f
which already handles all cudf types. The output is the number of bits per row but you can use cudf::reduce to get the total size and divide by 8 to get the number of bytes.
Thanks @davidwendt, this is very useful! I will check out the function you suggested.
Thanks for working on this @Blonck! It looks like this was a helpful starting point for #16126, which ultimately superseded this PR.