cudf
cudf copied to clipboard
Update to Thrust 1.17.0
Description
Updates the bundled version of Thrust to 1.17.0. I will run benchmarks and include results in a comment below.
Depends on #11457.
Supersedes #10489, #10577, #10586. Closes #10841. This should be merged concurrently with https://github.com/rapidsai/rapids-cmake/pull/231.
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
Codecov Report
:exclamation: No coverage uploaded for pull request base (
branch-22.10@9257549
). Click here to learn what that means. The diff coverage isn/a
.
@@ Coverage Diff @@
## branch-22.10 #11437 +/- ##
===============================================
Coverage ? 86.48%
===============================================
Files ? 144
Lines ? 22850
Branches ? 0
===============================================
Hits ? 19761
Misses ? 3089
Partials ? 0
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Can we investigate the increased compile times here? https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-cuda-build/CUDA=11.5/11169/Build_20Metrics_20Report/
The 2 mixed-join source files used to be the longest by far.
Now group_argmin/max.cu
are compiled with most time. Under the hood, they implement thrust::reduce_by_key
with several functors. One of the functors has row_lexicographic_comparator
as its private comparator.
Previously, by default we never had group_argmin/max.cu
compile successfully in CI because the compile time was indefinitely long, so we explicitly prevented the functor from inlining in thrust::reduce_by_key
. I have no idea what's wrong this time that makes the compile time for those files explode.
+1 on this update, having par_nosync
available is also pretty useful to my current work
The build metrics report for 18bd5e8 appears to indicate that the patch worked. src/groupby/sort/group_argmax.cu.o
took 14:27 min / 6.877 MB vs. the previous build metrics report for a454d0e which took 33:15 min / 13.821 MB.
The total libcudf.so size dropped from 461 MB to 407 MB, too!
@GregoryKimball Can we run benchmark on this open PR before merging? So we won't have to reverse if something happens?
I have benchmarked commits 7c621d02301745ebae45e60d1a417127dd4ef337 (Thrust 1.17, changes in this PR) and 217243c1c6bbb6ffc39d68c06ec14162daf6d8aa (Thrust 1.15, base). The results are below. Summary: I don't see anything too significant or worrisome resulting from the updated Thrust version.
I ran with 1 iteration of the benchmarks (total runtime: 95 minutes for each commit on my machine). The results below show any changes that are outside the bounds of -7% to +5% runtime. A lot of these "changes" are statistical noise that happens more readily for smaller inputs, or I/O benchmarks that I do not expect to be stable on my system's SSD.
Top Changes
Benchmark Time CPU Time Old Time New CPU Old CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000/1/manual_time +0.0748 +0.0134 0 0 0 0
COMPILED_BINARYOP/NULL_MAX_decimal32_decimal32_decimal32/10000/manual_time +0.0524 +0.0233 7 7 23 23
MultibyteSplitBenchmark/multibyte_split_simple/1/4/25/1073741824/manual_time -0.1510 -0.1510 534 453 534 453
OrcWrite/integral_file_output/30/1000/32/1/0/manual_time -0.3457 -0.0857 122 80 78 71
OrcWrite/integral_file_output/30/0/1/0/0/manual_time +0.1364 +0.1095 693 788 363 402
OrcWrite/integral_file_output/30/1000/1/0/0/manual_time -0.0768 -0.0754 568 524 372 344
OrcWrite/integral_file_output/30/0/32/0/0/manual_time -0.0547 -0.0708 73 69 65 60
OrcWrite/floats_file_output/31/0/1/1/0/manual_time -0.1276 -0.0272 677 591 445 433
OrcWrite/floats_file_output/31/1000/1/1/0/manual_time -0.5294 -0.0976 1376 647 554 500
OrcWrite/floats_file_output/31/0/32/1/0/manual_time +0.4990 -0.0905 513 769 376 342
OrcWrite/floats_file_output/31/1000/32/1/0/manual_time +0.1574 -0.0423 500 579 381 364
OrcWrite/floats_file_output/31/0/1/0/0/manual_time +0.7008 +0.0336 499 849 386 399
OrcWrite/floats_file_output/31/1000/1/0/0/manual_time +0.2548 +0.0255 519 651 358 367
OrcWrite/floats_file_output/31/0/32/0/0/manual_time +0.1270 +0.0043 503 567 352 353
OrcWrite/floats_file_output/31/1000/32/0/0/manual_time +0.0645 -0.0758 513 546 377 348
OrcWrite/decimal_file_output/35/1000/1/1/0/manual_time -0.0400 -0.1366 551 529 448 387
OrcWrite/decimal_file_output/35/0/32/1/0/manual_time -0.1372 +0.0387 549 474 354 368
OrcWrite/decimal_file_output/35/0/1/0/0/manual_time +0.0974 -0.0661 412 452 331 309
OrcWrite/decimal_file_output/35/1000/1/0/0/manual_time -0.2619 +0.0355 768 566 309 320
OrcWrite/decimal_file_output/35/0/32/0/0/manual_time +0.3382 -0.0293 396 530 325 315
OrcWrite/decimal_file_output/35/1000/32/0/0/manual_time -0.1151 -0.0505 474 419 325 309
OrcWrite/timestamps_file_output/33/1000/1/1/0/manual_time +0.3392 -0.0255 554 742 418 408
OrcWrite/timestamps_file_output/33/0/1/0/0/manual_time +0.5633 -0.1019 448 701 303 272
OrcWrite/timestamps_file_output/33/0/32/0/0/manual_time -0.0774 -0.0321 68 62 56 54
OrcWrite/timestamps_buffer_output/33/0/32/0/1/manual_time +0.0997 +0.0996 57 63 57 63
OrcWrite/string_file_output/23/0/1/1/0/manual_time -0.4234 +0.0007 1833 1057 899 899
OrcWrite/string_file_output/23/0/1/0/0/manual_time -0.1008 -0.0294 1042 937 809 786
OrcWrite/list_file_output/24/0/1/1/0/manual_time -0.1479 -0.0446 609 519 435 415
OrcWrite/list_file_output/24/0/1/0/0/manual_time +0.0622 -0.0448 648 689 385 368
OrcWrite/struct_file_output/28/0/1/1/0/manual_time -0.2634 -0.0140 1408 1037 919 906
Concatenate/BM_concatenate_nullable_false/64/2/manual_time -0.2041 -0.1359 0 0 0 0
Concatenate/BM_concatenate_nullable_false/512/2/manual_time +0.0527 +0.0256 0 0 0 0
Concatenate/BM_concatenate_nullable_false/4096/2/manual_time +0.0503 +0.0270 0 0 0 0
Concatenate/BM_concatenate_tables_nullable_false/256/8/64/manual_time +0.0511 +0.0457 0 0 0 0
Concatenate/BM_concatenate_tables_nullable_false/256/32/64/manual_time +0.0507 +0.0522 0 0 0 0
Concatenate/BM_concatenate_tables_nullable_false/512/32/64/manual_time +0.0696 +0.0669 0 0 0 0
Concatenate/BM_concatenate_tables_nullable_false/4096/32/64/manual_time +0.0776 +0.0748 0 0 0 0
SetNullmask/SetNullMaskKernel/1024/manual_time -0.0860 -0.0156 3473 3174 19662 19356
CsvWrite/floats_file_output/31/0/manual_time -0.1106 -0.0172 1360 1209 1173 1153
CsvWrite/decimal_file_output/35/0/manual_time -0.3696 -0.0612 1062 669 651 611
CsvWrite/timestamps_file_output/33/0/manual_time -0.1874 -0.0166 1799 1462 1371 1348
CsvWrite/durations_file_output/34/0/manual_time -0.1142 -0.0253 1856 1644 1584 1544
CsvWrite/string_file_output/23/0/manual_time -0.2134 -0.1063 646 508 491 438
ParquetRead/integral_file_input/29/0/1/0/0/manual_time -0.0750 -0.0749 113 104 113 104
ParquetRead/floats_file_input/31/0/1/1/0/manual_time -0.0799 -0.0800 106 98 106 98
ParquetRead/floats_file_input/31/0/1/0/0/manual_time -0.1372 -0.1371 107 92 107 92
ParquetRead/floats_file_input/31/1000/1/0/0/manual_time -0.0904 -0.0903 31 29 32 29
ParquetRead/floats_buffer_input/31/0/1/1/1/manual_time -0.1027 -0.1026 72 64 72 64
ParquetRead/floats_buffer_input/31/1000/1/1/1/manual_time -0.0791 -0.0790 25 23 25 23
ParquetRead/floats_buffer_input/31/0/1/0/1/manual_time -0.1087 -0.1087 72 64 72 64
ParquetRead/floats_buffer_input/31/1000/1/0/1/manual_time -0.0777 -0.0777 25 23 25 23
ParquetRead/decimal_file_input/35/0/1/0/0/manual_time -0.0947 -0.0948 46 42 46 42
ParquetRead/timestamps_file_input/33/0/1/1/0/manual_time -0.0812 -0.0812 120 110 120 110
ParquetRead/timestamps_file_input/33/0/1/0/0/manual_time -0.0706 -0.0703 100 93 100 93
ParquetRead/string_file_input/23/0/32/0/0/manual_time -0.0833 -0.0833 155 142 155 142
ParquetRead/struct_file_input/28/0/1/1/0/manual_time -0.0702 -0.0701 182 170 182 170
StringContains/contains_re/2097152/1/25/manual_time -0.1096 -0.1164 26 23 26 23
StringContains/findall_re/16777216/1/5/manual_time -0.0889 -0.0884 195 178 196 179
StringContains/findall_re/4096/1/25/manual_time +0.0612 +0.0593 0 0 0 0
StringsFromNumeric/strings_from_uint16/16384/manual_time +0.0608 +0.0427 32 34 48 50
StringExtract/four/4096/32/manual_time +0.0891 +0.0873 1 1 1 1
StringFindScalar/starts_with/4096/2048/manual_time +0.0787 +0.0510 0 0 0 0
StringFindScalar/ends_with/4096/2048/manual_time +0.0867 +0.0579 0 0 0 0
RepeatStrings/compute_output_strings_sizes/4096/64/manual_time +0.0511 +0.0333 0 0 0 0
Scatter/double_coalesce_x/4096/2/manual_time +0.0594 +0.0331 17934 18999 33602 34712
Reduction/bool_all/10000/manual_time -0.0755 -0.0234 9499 8782 25504 24907
Reduction/int8_t_all/10000/manual_time -0.0868 -0.0270 9810 8959 25669 24976
Reduction/bool_any/10000/manual_time -0.0758 -0.0241 9520 8798 25401 24788
Reduction/int8_t_any/10000/manual_time -0.0921 -0.0320 9862 8954 25740 24917
ReductionDictionary/int32_t_min/10000/manual_time +0.0989 +0.0585 21218 23317 37296 39477
ReductionDictionary/float_max/1000000/manual_time +0.0557 +0.0395 29099 30719 43650 45375
ReductionDictionary/float_mean/10000/manual_time +0.0531 +0.0390 24936 26259 40739 42328
Reduction/bool_minmax/10000/manual_time -0.0941 -0.0380 13806 12508 29818 28686
Reduction/bool_sum/10000/manual_time -0.0896 -0.0285 10083 9179 26011 25269
Reduction/int8_t_sum/10000/manual_time -0.0773 -0.0242 9669 8921 25640 25020
Repeat/double_nulls/16777216/8/manual_time -0.1098 -0.1214 8 7 8 7
Repeat/double_no_nulls/1024/1/manual_time -0.1090 -0.0844 0 0 0 0
OrcRead/integral_file_input/30/0/1/1/0/manual_time -0.0813 -0.0812 126 115 126 115
OrcRead/integral_file_input/30/1000/1/1/0/manual_time -0.0865 -0.0865 128 117 128 117
OrcRead/integral_file_input/30/0/1/0/0/manual_time -0.0993 -0.0992 126 113 126 113
OrcRead/integral_file_input/30/1000/1/0/0/manual_time -0.0780 -0.0778 125 116 125 116
OrcRead/floats_file_input/31/1000/1/1/0/manual_time -0.0731 -0.0730 117 109 117 109
OrcRead/floats_file_input/31/1000/32/1/0/manual_time -0.0766 -0.0769 112 103 112 103
OrcRead/floats_file_input/31/0/32/0/0/manual_time -0.0843 -0.0841 109 100 109 100
OrcRead/list_file_input/24/0/1/0/0/manual_time -0.0901 -0.0902 144 131 144 131
OrcRead/struct_file_input/28/0/1/1/0/manual_time -0.0792 -0.0794 240 221 240 221
OrcRead/struct_file_input/28/0/1/0/0/manual_time -0.0893 -0.0892 237 216 237 216
CopyIfElse/int16/4096/manual_time +0.0501 +0.0166 0 0 0 0
CopyIfElse/int16/32768/manual_time +0.0554 +0.0260 0 0 0 0
CopyIfElse/int16_no_nulls/4096/manual_time +0.1358 +0.0180 0 0 0 0
CopyIfElse/int16_no_nulls/32768/manual_time +0.0880 +0.0261 0 0 0 0
CopyIfElse/uint32_no_nulls/4096/manual_time +0.1337 +0.0211 0 0 0 0
CopyIfElse/float64_no_nulls/4096/manual_time +0.1034 +0.0212 0 0 0 0
ApplyBooleanMask<int8_t>/int8_1_col/10000000/50 -0.0778 -0.0778 212671 196128 212659 196120
JsonPath/query0/100/300/manual_time -0.0920 -0.0875 0 0 0 0
JsonPath/query0/100000/600/manual_time +0.0508 +0.0507 11 12 11 12
JsonPath/query6/100/600/manual_time +0.0660 +0.0635 0 0 0 0
JsonPath/query8/100/600/manual_time +0.0644 +0.0607 0 0 0 0
TypeDispatcher/fp64_bandwidth_host/4/2048/1/manual_time -0.0764 -0.0310 10266 9481 26304 25489
TypeDispatcher/fp64_bandwidth_host/2/4096/1/manual_time -0.0721 -0.0291 6183 5737 22420 21767
TypeDispatcher/fp64_bandwidth_device/8/1024/1/manual_time -0.0825 -0.0376 13229 12137 29340 28238
TypeDispatcher/fp64_bandwidth_device/4/2048/1/manual_time -0.0707 -0.0301 10782 10020 26893 26084
TypeDispatcher/fp64_bandwidth_no/2/1024/1/manual_time -0.1232 -0.0220 4399 3857 20863 20404
TypeDispatcher/fp64_bandwidth_no/4/1024/1/manual_time -0.1836 -0.0492 5361 4377 21851 20777
TypeDispatcher/fp64_bandwidth_no/8/1024/1/manual_time -0.2050 -0.0636 6786 5395 23315 21832
TypeDispatcher/fp64_bandwidth_no/1/2048/1/manual_time -0.0736 -0.0127 3960 3668 20411 20151
TypeDispatcher/fp64_bandwidth_no/2/2048/1/manual_time -0.1280 -0.0171 4489 3915 20790 20435
TypeDispatcher/fp64_bandwidth_no/4/2048/1/manual_time -0.1560 -0.0345 5230 4414 21546 20803
TypeDispatcher/fp64_bandwidth_no/8/2048/1/manual_time +0.2073 +0.0594 5671 6846 21972 23277
TypeDispatcher/fp64_bandwidth_no/1/4096/1/manual_time -0.0905 -0.0141 4041 3675 20322 20035
TypeDispatcher/fp64_bandwidth_no/2/4096/1/manual_time -0.0897 -0.0155 4395 4001 20868 20545
TypeDispatcher/fp64_bandwidth_no/4/4096/1/manual_time +0.1863 +0.0481 4506 5346 20837 21839
TypeDispatcher/fp64_bandwidth_no/2/8192/1/manual_time +0.1220 +0.0176 4047 4540 20541 20903
TypeDispatcher/fp64_bandwidth_no/1/16384/1/manual_time +0.0985 +0.0119 3832 4209 20303 20544
AST<int32_t, TreeType::IMBALANCED_LEFT, true, true>/ast_int32_imbalanced_reuse_nulls/100000/1/manual_time -0.0925 -0.0544 0 0 0 0
ParquetWrite/integral_file_output/29/1000/1/1/0/manual_time -0.2648 -0.0362 250 184 178 172
ParquetWrite/integral_file_output/29/0/32/1/0/manual_time -0.1182 -0.0551 64 56 56 53
ParquetWrite/integral_file_output/29/0/1/0/0/manual_time -0.5755 -0.0247 1136 482 401 391
ParquetWrite/integral_file_output/29/1000/1/0/0/manual_time -0.2619 -0.0651 232 172 169 158
ParquetWrite/integral_file_output/29/0/32/0/0/manual_time -0.1309 -0.0620 61 53 52 49
ParquetWrite/integral_file_output/29/1000/32/0/0/manual_time -0.1166 -0.0633 49 43 42 39
ParquetWrite/floats_file_output/31/0/1/1/0/manual_time -0.1549 -0.0979 609 514 470 424
ParquetWrite/floats_file_output/31/1000/1/1/0/manual_time -0.1381 -0.0600 151 130 125 117
ParquetWrite/floats_file_output/31/0/32/1/0/manual_time -0.4697 -0.2097 84 45 51 40
ParquetWrite/floats_file_output/31/0/1/0/0/manual_time -0.5106 -0.0254 938 459 397 387
ParquetWrite/floats_file_output/31/1000/1/0/0/manual_time -0.2083 -0.0743 153 121 119 111
ParquetWrite/floats_file_output/31/0/32/0/0/manual_time -0.1193 -0.0525 48 43 40 38
ParquetWrite/floats_buffer_output/31/1000/1/0/1/manual_time -0.1645 -0.1645 166 139 166 139
ParquetWrite/decimal_file_output/35/0/1/1/0/manual_time -0.0741 -0.0140 317 294 269 266
ParquetWrite/decimal_file_output/35/1000/1/1/0/manual_time -0.0902 -0.0256 401 365 363 353
ParquetWrite/decimal_file_output/35/0/1/0/0/manual_time -0.3771 -0.0451 283 176 172 164
ParquetWrite/decimal_file_output/35/1000/1/0/0/manual_time -0.2757 -0.0337 239 173 164 159
ParquetWrite/decimal_file_output/35/0/32/0/0/manual_time -0.4012 -0.0695 243 146 142 133
ParquetWrite/decimal_file_output/35/1000/32/0/0/manual_time -0.2798 -0.0665 181 131 127 119
ParquetWrite/timestamps_file_output/33/0/1/1/0/manual_time -0.4736 -0.1486 971 511 474 404
ParquetWrite/timestamps_file_output/33/1000/1/1/0/manual_time -0.1853 -0.0924 132 108 108 98
ParquetWrite/timestamps_file_output/33/0/32/1/0/manual_time -0.1366 -0.0930 51 44 45 40
ParquetWrite/timestamps_file_output/33/0/1/0/0/manual_time -0.1584 +0.0025 503 423 361 362
ParquetWrite/timestamps_file_output/33/1000/1/0/0/manual_time -0.2067 -0.0944 129 102 102 93
ParquetWrite/timestamps_file_output/33/0/32/0/0/manual_time -0.2785 -0.1351 51 37 38 33
ParquetWrite/timestamps_file_output/33/1000/32/0/0/manual_time -0.1326 -0.1035 30 26 26 23
ParquetWrite/durations_file_output/34/0/1/1/0/manual_time -0.1439 -0.0064 572 489 419 416
ParquetWrite/durations_file_output/34/1000/1/1/0/manual_time -0.1863 -0.0692 134 109 107 100
ParquetWrite/durations_file_output/34/0/32/1/0/manual_time -0.1738 -0.1645 52 43 47 40
ParquetWrite/durations_file_output/34/0/1/0/0/manual_time -0.3330 -0.0856 641 427 387 353
ParquetWrite/durations_file_output/34/1000/1/0/0/manual_time -0.1568 -0.0482 128 108 101 97
ParquetWrite/durations_file_output/34/0/32/0/0/manual_time -0.1460 -0.0809 41 35 35 32
ParquetWrite/durations_file_output/34/1000/32/0/0/manual_time -0.1286 -0.1046 30 26 25 23
ParquetWrite/durations_buffer_output/34/1000/1/0/1/manual_time -0.1315 -0.1315 143 124 143 125
ParquetWrite/string_file_output/23/0/1/1/0/manual_time -0.1595 -0.1181 712 599 568 501
ParquetWrite/string_file_output/23/1000/1/1/0/manual_time -0.1440 -0.0523 80 69 66 62
ParquetWrite/string_file_output/23/0/32/1/0/manual_time -0.2298 -0.0868 720 554 506 462
ParquetWrite/string_file_output/23/1000/32/1/0/manual_time -0.4234 -0.2866 43 25 31 22
ParquetWrite/string_file_output/23/0/1/0/0/manual_time -0.1762 -0.0239 641 528 456 445
ParquetWrite/string_file_output/23/1000/1/0/0/manual_time -0.2123 -0.0450 86 68 64 61
ParquetWrite/string_file_output/23/0/32/0/0/manual_time -0.2076 -0.0489 679 538 450 428
ParquetWrite/string_file_output/23/1000/32/0/0/manual_time -0.0894 -0.0656 26 24 22 21
ParquetWrite/list_file_output/24/0/1/1/0/manual_time -0.1047 -0.0397 1119 1001 937 900
ParquetWrite/list_file_output/24/0/1/0/0/manual_time -0.2311 -0.0214 783 602 608 595
ParquetWrite/list_file_output/24/1000/1/0/0/manual_time -0.1403 -0.0354 427 367 370 357
ParquetWrite/list_file_output/24/1000/32/0/0/manual_time +0.4736 +0.0128 249 366 242 245
ParquetWrite/struct_file_output/28/0/1/1/0/manual_time -0.0927 -0.0303 697 632 539 523
ParquetWrite/struct_file_output/28/0/32/1/0/manual_time +0.3057 +0.0242 508 663 366 374
ParquetWrite/struct_file_output/28/1000/32/1/0/manual_time -0.1025 -0.1352 62 56 47 41
ParquetWrite/struct_file_output/28/0/1/0/0/manual_time -0.1608 -0.0708 649 545 429 399
ParquetWrite/struct_file_output/28/0/32/0/0/manual_time -0.2489 -0.1268 716 538 370 323
ParquetWrite/struct_file_output/28/1000/32/0/0/manual_time -0.2151 -0.1953 68 53 50 40
TextTokenize/single/262144/2048/manual_time -0.0892 -0.0892 22 20 22 20
TextTokenize/count/262144/2048/manual_time -0.0951 -0.0949 10 9 10 9
ContiguousSplit/4Gb512ColsValidity/4294967296/512/256/1/iterations:8/manual_time -0.1039 -0.1037 43 38 43 38
ContiguousSplit/1Gb512ColsValidity/1073741824/512/256/1/iterations:8/manual_time -0.1147 -0.1146 33 29 33 29
Search/Table/8/1000/manual_time -0.0715 -0.0603 0 0 0 0
rerun tests
@gpucibot merge