cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Refactor joins for conditional semis and antis

Open DanialJavady96 opened this issue 1 year ago • 15 comments

Contributes to #10039

Currently conditional_joins for both semi and anti joins rely on an implementation that was designed for taking in results from both tables involved in the join. This leads to wasteful allocation that can be optimized for these two cases.

Description

Add a new kernel to be used for both semi and anti joins. Add some new device functions for adding only one array of shared_memory for caching.

Tests pass on my 3080.

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [ ] The documentation is up to date with these changes.

DanialJavady96 avatar Dec 18 '23 17:12 DanialJavady96

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Dec 18 '23 17:12 copy-pr-bot[bot]

CC @bdice @vyasr please let me know if changes needed to be made or if i misunderstood anything. I imagine in desire of keeping PRs smaller that this shouldn't touch the size APIs

DanialJavady96 avatar Dec 18 '23 17:12 DanialJavady96

Did some benchmarking with branch-24.02 and this branch, performance gains were negligible/statistically insignificant(1-3% gains). However, I made some changes by removing the compute_size kernels, and used a pessimistic assumption that the size would always be the left table size N(compromise memory for runtime speed up), and gains were significant

My specs are as follows image

CPU: 12th Gen Intel(R) Core(TM) i9-12900K, 3200 Mhz, 16 Core(s), 24 Logical Processor(s) GPU: RTX 3080. RAM: 64gb ddr5 OS: WSL2 Win 11 host os

image

DanialJavady96 avatar Dec 20 '23 18:12 DanialJavady96

/ok to test

vuule avatar Dec 20 '23 20:12 vuule

@vyasr would you please take a look when you get back?

GregoryKimball avatar Jan 03 '24 18:01 GregoryKimball

Please note that this PR addresses part of https://github.com/rapidsai/cudf/issues/10039

GregoryKimball avatar Jan 03 '24 19:01 GregoryKimball

/ok to test

PointKernel avatar Jan 03 '24 20:01 PointKernel

@DanialJavady96 Making this ready for review to draw proper attention from reviewers

PointKernel avatar Jan 03 '24 20:01 PointKernel

On this branch:

ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit/100000/100000/manual_time               314 ms          314 ms            2
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit/100000/400000/manual_time              1138 ms         1138 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit/100000/1000000/manual_time             2771 ms         2771 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit/100000/100000/manual_time               322 ms          322 ms            2
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit/100000/400000/manual_time              1161 ms         1162 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit/100000/1000000/manual_time             2836 ms         2836 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit_nulls/100000/100000/manual_time         540 ms          540 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit_nulls/100000/400000/manual_time        1935 ms         1935 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit_nulls/100000/1000000/manual_time       4747 ms         4747 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit_nulls/100000/100000/manual_time         548 ms          548 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit_nulls/100000/400000/manual_time        2001 ms         2001 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit_nulls/100000/1000000/manual_time       4881 ms         4881 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit/100000/100000/manual_time               323 ms          323 ms            2
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit/100000/400000/manual_time              1155 ms         1155 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit/100000/1000000/manual_time             2784 ms         2784 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit/100000/100000/manual_time               327 ms          327 ms            2
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit/100000/400000/manual_time              1163 ms         1163 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit/100000/1000000/manual_time             2906 ms         2906 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit_nulls/100000/100000/manual_time         544 ms          544 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit_nulls/100000/400000/manual_time        1986 ms         1986 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit_nulls/100000/1000000/manual_time       4774 ms         4774 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit_nulls/100000/100000/manual_time         559 ms          559 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit_nulls/100000/400000/manual_time        2045 ms         2045 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit_nulls/100000/1000000/manual_time       4925 ms         4925 ms            1

On branch-24.02:

ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit/100000/100000/manual_time               317 ms          317 ms            2
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit/100000/400000/manual_time              1138 ms         1137 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit/100000/1000000/manual_time             2788 ms         2788 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit/100000/100000/manual_time               323 ms          323 ms            2
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit/100000/400000/manual_time              1167 ms         1167 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit/100000/1000000/manual_time             2861 ms         2861 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit_nulls/100000/100000/manual_time         543 ms          543 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit_nulls/100000/400000/manual_time        1952 ms         1952 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit_nulls/100000/1000000/manual_time       4830 ms         4830 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit_nulls/100000/100000/manual_time         576 ms          576 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit_nulls/100000/400000/manual_time        2018 ms         2018 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit_nulls/100000/1000000/manual_time       4931 ms         4931 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit/100000/100000/manual_time               323 ms          323 ms            2
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit/100000/400000/manual_time              1151 ms         1151 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit/100000/1000000/manual_time             2841 ms         2841 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit/100000/100000/manual_time               330 ms          330 ms            2
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit/100000/400000/manual_time              1180 ms         1180 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit/100000/1000000/manual_time             2961 ms         2961 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit_nulls/100000/100000/manual_time         540 ms          540 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit_nulls/100000/400000/manual_time        1962 ms         1962 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit_nulls/100000/1000000/manual_time       4813 ms         4813 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit_nulls/100000/100000/manual_time         566 ms          566 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit_nulls/100000/400000/manual_time        2085 ms         2085 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit_nulls/100000/1000000/manual_time       5063 ms         5063 ms            1

Unfortunately not getting significant speed ups. Would it make sense to include the removal of the join size kernels? @PointKernel

DanialJavady96 avatar Jan 04 '24 21:01 DanialJavady96

/ok to test

PointKernel avatar Jan 06 '24 00:01 PointKernel

/ok to test

bdice avatar Apr 16 '24 15:04 bdice

Do we need any expanded tests? I'll try to look into that.

Responding to myself -- I think our testing looks okay for now. I don't know of anything that would need to be changed. https://github.com/rapidsai/cudf/blob/branch-24.06/cpp/tests/join/conditional_join_tests.cu

bdice avatar Apr 17 '24 02:04 bdice

/ok to test

bdice avatar Apr 18 '24 22:04 bdice

@bdice

Benchmark                                                                                                    Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------------------------------------------
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit/100000/100000/manual_time               311 ms          312 ms            2
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit/100000/400000/manual_time              1126 ms         1126 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit/100000/1000000/manual_time             2748 ms         2748 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit/100000/100000/manual_time               318 ms          318 ms            2
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit/100000/400000/manual_time              1147 ms         1147 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit/100000/1000000/manual_time             2796 ms         2796 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit_nulls/100000/100000/manual_time         415 ms          415 ms            2
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit_nulls/100000/400000/manual_time        1485 ms         1485 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_anti_join_32bit_nulls/100000/1000000/manual_time       3605 ms         3605 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit_nulls/100000/100000/manual_time         417 ms          417 ms            2
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit_nulls/100000/400000/manual_time        1497 ms         1497 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_anti_join_64bit_nulls/100000/1000000/manual_time       3651 ms         3651 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit/100000/100000/manual_time               310 ms          310 ms            2
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit/100000/400000/manual_time              1117 ms         1117 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit/100000/1000000/manual_time             2725 ms         2725 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit/100000/100000/manual_time               316 ms          316 ms            2
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit/100000/400000/manual_time              1142 ms         1142 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit/100000/1000000/manual_time             2782 ms         2782 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit_nulls/100000/100000/manual_time         412 ms          412 ms            2
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit_nulls/100000/400000/manual_time        1482 ms         1482 ms            1
ConditionalJoin<int32_t, int32_t>/conditional_left_semi_join_32bit_nulls/100000/1000000/manual_time       3615 ms         3615 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit_nulls/100000/100000/manual_time         418 ms          418 ms            2
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit_nulls/100000/400000/manual_time        1501 ms         1501 ms            1
ConditionalJoin<int64_t, int64_t>/conditional_left_semi_join_64bit_nulls/100000/1000000/manual_time       3658 ms         3658 ms            1
(pyt_dev) ksm@Kashimo:~/cudf/cpp/build/benchmarks$ 

Compared to the benchmarks here,

https://github.com/rapidsai/cudf/pull/14646#issuecomment-1877775716

Looks pretty good! Some of the gains are quite significant.

ZelboK avatar Apr 22 '24 21:04 ZelboK

/ok to test

PointKernel avatar Apr 24 '24 17:04 PointKernel

/merge

bdice avatar Apr 30 '24 14:04 bdice