cudf
cudf copied to clipboard
Add TPC-H benchmarks for Libcudf
- [x] Query 1
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
/ok to test
@JayjeetAtGithub would you please split the fixed_point literals change into a separate PR?
Sure @GregoryKimball
Thanks for working on this @JayjeetAtGithub! I will review it promptly. Meanwhile, could you please update the description (should mostly be copy-paste + some other details from the README) and briefly detail what this PR is all about! Thanks
/ok to test
@JayjeetAtGithub These TPC-H-inspired examples look excellent. I would also like to post your performance results so far to show off your great work.
@JayjeetAtGithub These TPC-H-inspired examples look excellent. I would also like to post your performance results so far to show off your great work.
Just to help clarify for future reference, this is SF10 running on NVIDIA T4 GPU, I believe.
Hi all, I have ported the queries to work on TPC-H parquet files where the decimal64 columns is casted to float64. That means I am no more using cudf::fixed_point_scalar<numeric::decimal64> and instead using just cudf::numeric_scalar<double>. I have checked the query results, and they are consistent.
@JayjeetAtGithub Can you help me understand why we aren’t using decimal types? I think that using the same data types is essential for parity of benchmarks with other implementations.
@JayjeetAtGithub Can you help me understand why we aren’t using decimal types? I think that using the same data types is essential for parity of benchmarks with other implementations
@bdice I found out that the other benchmarks within the team is using the TPC-H datasets in the /datasets/NOOOM- directory within the RDS servers, which has decimal64 columns casted to float64. So, I changed to using float64. But I can always change my code back to decimal64, if that would work better.
/ok to test
/ok to test
The PR title says these are benchmarks, but they are in
cpp/examplesrather thancpp/benchmarks. If they are indeed benchmarks I think they should be moved. Also, they should use one of the benchmarking frameworks we use, either google benchmark or nvbench. I think we are generally preferring the latter but I often run into "noise" problems with it because it is intended as a microbenchmark framework for device-side code rather than combined application-level benchmarking like this. So gbench may work better.
I think Mark’s proposal is a good idea. It should allow us to integrate with the existing microbenchmarking measurement frameworks we have, and therefore allow us to track performance over time. Do you have thoughts on this @JayjeetAtGithub @GregoryKimball @mhaseeb123? I guess creating/finding data files is the hardest part.
The PR title says these are benchmarks, but they are in
cpp/examplesrather thancpp/benchmarks. If they are indeed benchmarks I think they should be moved. Also, they should use one of the benchmarking frameworks we use, either google benchmark or nvbench. I think we are generally preferring the latter but I often run into "noise" problems with it because it is intended as a microbenchmark framework for device-side code rather than combined application-level benchmarking like this. So gbench may work better.I think Mark’s proposal is a good idea. It should allow us to integrate with the existing microbenchmarking measurement frameworks we have, and therefore allow us to track performance over time. Do you have thoughts on this @JayjeetAtGithub @GregoryKimball @mhaseeb123? I guess creating/finding data files is the hardest part.
I think this is okay under examples for now - actually it is a good example and depicts good use of libcudf as building block. I agree with the proposal that this example may evolve to a benchmark in the future with preferably data generation from within the C++ code rather than using static data files.
The PR title says these are benchmarks, but they are in cpp/examples rather than cpp/benchmarks. If they are indeed benchmarks I think they should be moved. Also, they should use one of the benchmarking frameworks we use, either google benchmark or nvbench. I think we are generally preferring the latter but I often run into "noise" problems with it because it is intended as a microbenchmark framework for device-side code rather than combined application-level benchmarking like this. So gbench may work better.
I think Mark’s proposal is a good idea. It should allow us to integrate with the existing microbenchmarking measurement frameworks we have, and therefore allow us to track performance over time. Do you have thoughts on this @JayjeetAtGithub @GregoryKimball @mhaseeb123? I guess creating/finding data files is the hardest part.
I think this is okay under examples for now - actually it is a good example and depicts good use of libcudf as building block. I agree with the proposal that this example may evolve to a benchmark in the future with preferably data generation from within the C++ code rather than using static data files.
Thanks a lot for your thoughts on this @harrism @bdice @mhaseeb123. Along the same lines as @mhaseeb123 mentioned, I would prefer this PR to keep the code under examples and would like to evolve this work as follows (probably each one of these would be a separate PR):
- Add small (SF=0.01) datasets to a
cpp/examples/tpch/datasetsdirectory and use them to run the examples in CI - Add a way to test the correctness of these queries: Add parquet files containing the result of the queries (at smaller SFs such as 0.01) generated from some other source such as DuckDB / Velox and compare them with the result generated by the
libcudfimplementations in this PR - Convert the code into NVBENCH benchmarks and move the code to a
cpp/benchmarks/tpchsub-directory as mentioned by @harrism and @bdice - Hopefully, by this time, the cudf-gen would be mature enough, that, just like @mhaseeb123 mentioned, we could remove the static data files, and use this generator to generate Parquet data directly in host buffers within the benchmarks code.
- Keep adding more queries on the side, asynchronously to complete the suite.
Please let me know your thoughts ! Thanks
/ok to test
/ok to test
/ok to test
/ok to test
/ok to test
/ok to test
/merge
/merge