cudf
cudf copied to clipboard
Refactor CSV reader benchmarks with nvbench
Description
Closes #10941
This PR refactors the CSV reader benchmarks with nvbench and reduces the number of test cases by isolating data type, IO type, column selection, and row selection.
Example output of the new benchmarks:
Benchmark results
## csv_read_data_type[0] Quadro RTX 8000
data_type | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
---|---|---|---|---|---|---|---|---|
INTEGRAL | 5x | 1.140 s | 0.09% | 1.140 s | 0.09% | 235553841 | 1.202 GiB | 668.564 MiB |
FLOAT | 5x | 1.262 s | 0.04% | 1.262 s | 0.04% | 212718321 | 1.041 GiB | 713.885 MiB |
DECIMAL | 5x | 272.787 ms | 0.03% | 272.784 ms | 0.03% | 984060406 | 396.279 MiB | 167.951 MiB |
TIMESTAMP | 7x | 1.681 s | 0.47% | 1.681 s | 0.47% | 159723724 | 2.281 GiB | 814.268 MiB |
DURATION | 7x | 2.121 s | 0.50% | 2.121 s | 0.50% | 126587514 | 2.588 GiB | 971.320 MiB |
STRING | 19x | 496.713 ms | 0.50% | 496.710 ms | 0.50% | 540426462 | 859.526 MiB | 277.082 MiB |
csv_read_io
[0] Quadro RTX 8000
io | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
---|---|---|---|---|---|---|---|---|
FILEPATH | 9x | 1.185 s | 0.49% | 1.185 s | 0.49% | 226466264 | 1.445 GiB | 618.876 MiB |
HOST_BUFFER | 5x | 1.170 s | 0.14% | 1.170 s | 0.14% | 229459856 | 1.445 GiB | 618.876 MiB |
csv_read_column_selection
[0] Quadro RTX 8000
column_selection | row_selection | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
---|---|---|---|---|---|---|---|---|---|
ALL | ALL | 5x | 1.246 s | 0.18% | 1.246 s | 0.18% | 215514992 | 1.582 GiB | 653.520 MiB |
ALTERNATE | ALL | 5x | 1.128 s | 0.08% | 1.128 s | 0.08% | 119009844 | 1.116 GiB | 648.908 MiB |
FIRST_HALF | ALL | 5x | 1.143 s | 0.07% | 1.143 s | 0.07% | 117443933 | 1.121 GiB | 653.520 MiB |
SECOND_HALF | ALL | 5x | 1.152 s | 0.16% | 1.152 s | 0.16% | 116478469 | 1.121 GiB | 653.520 MiB |
csv_read_row_selection
[0] Quadro RTX 8000
column_selection | row_selection | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
---|---|---|---|---|---|---|---|---|---|
ALL | BYTE_RANGE | 5x | 1.245 s | 0.11% | 1.245 s | 0.11% | 215674679 | 1.582 GiB | 653.520 MiB |
ALL | NROWS | 5x | 1.245 s | 0.11% | 1.245 s | 0.11% | 215642738 | 1.582 GiB | 653.520 MiB |
ALL | SKIPFOOTER | 5x | 1.246 s | 0.09% | 1.246 s | 0.09% | 215464113 | 1.582 GiB | 653.520 MiB |
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
Codecov Report
:exclamation: No coverage uploaded for pull request base (
branch-22.10@f485667
). Click here to learn what that means. Patch has no changes to coverable lines.
Additional details and impacted files
@@ Coverage Diff @@
## branch-22.10 #11678 +/- ##
===============================================
Coverage ? 85.89%
===============================================
Files ? 151
Lines ? 23534
Branches ? 0
===============================================
Hits ? 20214
Misses ? 3320
Partials ? 0
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@gpucibot merge