datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

feat: Support equijoins in `NestedLoopJoin`

Open jonathanc-n opened this issue 6 months ago • 1 comments

Which issue does this PR close?

  • Closes #.

Rationale for this change

We want to support equijoins in NestedLoopJoin in the case where one of the tables in the join is very small.

What changes are included in this PR?

I have added a nested_loop_equijoin_threshold to the OptimizerOptions which has a default value of 5 (same as DuckDB, here). This is the threshold for the number of rows that can be in either table so that the physical planner will choose a NestedLoopJoinExec over SortMergeJoin and HashJoin.

If either table has less than 5 rows then we will pass the join_on expressions to the join_filter.

Are these changes tested?

By existing tests

jonathanc-n avatar Jun 18 '25 19:06 jonathanc-n

I will try to run a benchmark on a table with smaller rows and return the result when finished.

jonathanc-n avatar Jun 18 '25 19:06 jonathanc-n

The upside is that it performs well when both tables are extremely small < 50 rows 😆

jonathanc-n avatar Jun 19 '25 01:06 jonathanc-n

Maybe it'll be clear to change the title as "Use NestedLoopJoin instead of HashJoin/SortMergeJoin for small tables", I'm confused when I saw the PR title first.

xudong963 avatar Jun 19 '25 10:06 xudong963

The upside is that it performs well when both tables are extremely small < 50 rows 😆

Do we have some benchmark results?

xudong963 avatar Jun 19 '25 10:06 xudong963

@xudong963 These were tests run with one of the sides having 5 rows:

Click to expand

joins/HashJoin/l=16_r=5 time:   [9.5541 µs 9.6068 µs 9.6640 µs]
                       change: [-1.2426% -0.2737% +0.6768%] (p = 0.57 > 0.05)
                       No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
 5 (5.00%) low mild
 8 (8.00%) high mild
 2 (2.00%) high severe
joins/NestedLoopJoin/l=16_r=5
                       time:   [8.3347 µs 8.4427 µs 8.5472 µs]
                       change: [+16.951% +17.961% +19.019%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
 1 (1.00%) high mild
joins/HashJoin/l=64_r=5 time:   [9.7575 µs 9.8982 µs 10.029 µs]
                       change: [-6.8033% -2.8109% -0.1691%] (p = 0.11 > 0.05)
                       No change in performance detected.
Found 30 outliers among 100 measurements (30.00%)
 8 (8.00%) low severe
 5 (5.00%) low mild
 2 (2.00%) high mild
 15 (15.00%) high severe
joins/NestedLoopJoin/l=64_r=5
                       time:   [10.104 µs 10.157 µs 10.228 µs]
                       change: [+12.067% +12.951% +13.830%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
 1 (1.00%) high mild
joins/HashJoin/l=256_r=5
                       time:   [10.351 µs 10.460 µs 10.576 µs]
                       change: [-0.9628% +0.0307% +1.0802%] (p = 0.95 > 0.05)
                       No change in performance detected.
joins/NestedLoopJoin/l=256_r=5
                       time:   [20.469 µs 20.519 µs 20.577 µs]
                       change: [+8.3494% +9.3946% +10.285%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
 3 (3.00%) low mild
 3 (3.00%) high severe
joins/HashJoin/l=1024_r=5
                       time:   [24.460 µs 24.713 µs 24.901 µs]
                       change: [+21.363% +31.250% +41.778%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
 16 (16.00%) low severe
 3 (3.00%) low mild
 4 (4.00%) high mild
joins/NestedLoopJoin/l=1024_r=5
                       time:   [67.322 µs 67.751 µs 68.268 µs]
                       change: [+6.0403% +8.0079% +9.5998%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
 1 (1.00%) high mild
 2 (2.00%) high severe
joins/HashJoin/l=4096_r=5
                       time:   [22.298 µs 22.814 µs 23.411 µs]
                       change: [-4.7502% -0.0368% +4.6566%] (p = 0.99 > 0.05)
                       No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
 5 (5.00%) high mild
 1 (1.00%) high severe
joins/NestedLoopJoin/l=4096_r=5
                       time:   [258.64 µs 259.56 µs 260.40 µs]
                       change: [+5.1991% +8.4282% +10.949%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 22 outliers among 100 measurements (22.00%)
 13 (13.00%) low severe
 3 (3.00%) low mild
 5 (5.00%) high mild
 1 (1.00%) high severe
joins/HashJoin/l=32768_r=5
                       time:   [118.18 µs 120.61 µs 124.07 µs]
                       change: [-0.5396% +0.8988% +2.6146%] (p = 0.26 > 0.05)
                       No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
 3 (3.00%) high mild
 1 (1.00%) high severe
joins/NestedLoopJoin/l=32768_r=5
                       time:   [2.3731 ms 2.4175 ms 2.4715 ms]
                       change: [-2.4822% +2.7216% +7.0548%] (p = 0.29 > 0.05)
                       No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
 6 (6.00%) high mild
 5 (5.00%) high severe
joins/HashJoin/l=5_r=16 time:   [9.3953 µs 9.4891 µs 9.5818 µs]
                       change: [+0.2660% +2.3151% +3.8663%] (p = 0.01 < 0.05)
                       Change within noise threshold.
joins/NestedLoopJoin/l=5_r=16
                       time:   [8.4094 µs 8.4620 µs 8.5189 µs]
                       change: [+7.3597% +13.271% +16.947%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
 2 (2.00%) low mild
joins/HashJoin/l=5_r=64 time:   [10.281 µs 10.304 µs 10.322 µs]
                       change: [+5.5270% +7.6174% +9.6222%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
 1 (1.00%) low mild
joins/NestedLoopJoin/l=5_r=64
                       time:   [10.511 µs 10.644 µs 10.845 µs]
                       change: [+7.7402% +11.531% +14.867%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
 1 (1.00%) low mild
 2 (2.00%) high mild
 1 (1.00%) high severe
joins/HashJoin/l=5_r=256
                       time:   [9.8597 µs 9.9384 µs 10.007 µs]
                       change: [-1.8462% -0.6659% +0.4999%] (p = 0.29 > 0.05)
                       No change in performance detected.
joins/NestedLoopJoin/l=5_r=256
                       time:   [21.108 µs 21.195 µs 21.280 µs]
                       change: [+1.5413% +6.7273% +10.204%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
 1 (1.00%) high mild
joins/HashJoin/l=5_r=1024
                       time:   [11.088 µs 11.182 µs 11.279 µs]
                       change: [-0.8211% +0.3052% +1.3850%] (p = 0.59 > 0.05)
                       No change in performance detected.
joins/NestedLoopJoin/l=5_r=1024
                       time:   [69.937 µs 76.254 µs 83.638 µs]
                       change: [+5.7682% +9.4840% +14.122%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
 4 (4.00%) high severe
joins/HashJoin/l=5_r=4096
                       time:   [16.550 µs 16.633 µs 16.713 µs]
                       change: [-1.4548% -0.7755% -0.0668%] (p = 0.03 < 0.05)
                       Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
 1 (1.00%) high mild
joins/NestedLoopJoin/l=5_r=4096
                       time:   [271.97 µs 272.96 µs 273.65 µs]
                       change: [+13.243% +14.024% +14.795%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
 7 (7.00%) low severe
 1 (1.00%) low mild
 5 (5.00%) high mild
 2 (2.00%) high severe
joins/HashJoin/l=5_r=32768
                       time:   [79.925 µs 81.041 µs 82.978 µs]
                       change: [+2.3887% +3.6331% +5.0589%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
 1 (1.00%) high mild
 1 (1.00%) high severe
joins/NestedLoopJoin/l=5_r=32768
                       time:   [2.6531 ms 2.7439 ms 2.8609 ms]
                       change: [+9.0545% +13.275% +18.591%] (p = 0.00 < 0.05)
                       Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
 4 (4.00%) high mild
 4 (4.00%) high severe

Interesting I noticed I wasn't using a filter for the NestedLoopJoin and it was hardly performing faster. The benchmarks I just sent were the benchmarks using the filter for NLJ, which is why there are some regressions. I'll see if I can take some time to see how we can speed up NestedLoopJoin.

The reason why NestedLoopJoin is so slow is due to the cartesian product which is being calculated + running the filter through all the indices (which is also memory expensive). It would probably be faster to just keep one side in memory if possible. and have the other side run a block nested loop join on it. In that case, equijoins tend to be much faster on a smaller table.

jonathanc-n avatar Jun 19 '25 13:06 jonathanc-n

I'll try to open a pull request later for creating a performance bench file for specifically benchmarking joins.

jonathanc-n avatar Jun 19 '25 13:06 jonathanc-n