opteryx icon indicating copy to clipboard operation
opteryx copied to clipboard

Add non-equi join support using draken representation

Open Copilot opened this issue 2 weeks ago • 0 comments

Thank you for opening a Pull Request!

We appreciate your contribution to Opteryx. Your time and effort make a difference, and we're excited to review your changes. To help ensure a smooth review process, please check the following:

Checklist for a Successful PR

  • [x] Start the conversation: If you haven't already, raise a bug/feature request or start a discussion. This ensures alignment on the change and approach.
  • [x] Run the tests: Confirm that all tests pass without errors.
  • [x] Maintain code coverage: If you've added or modified source code ensure new tests are added to the test suite.
  • [x] Update documentation and tests (if applicable): If your changes impact functionality, make sure the relevant docs and test cases are updated.

Description

Implements non-equi joins (!=, >, >=, <, <=) using draken's columnar representation layer with a straightforward nested loop algorithm.

Core Implementation:

  • non_equi_join.pyx: Cython function converting Arrow tables via Morsel.from_arrow(), performing element-wise comparisons using vector indexing. O(n×m) nested loop, NULL-skipping.
  • NonEquiJoinNode: Operator following existing join patterns—buffers left relation, streams right.
  • Planner integration: Routes non_equi join type to new operator.

Example:

from opteryx.compiled.joins import non_equi_nested_loop_join
import pyarrow as pa

employees = pa.table({"name": ["Alice", "Bob"], "salary": [45000, 62000]})
grades = pa.table({"grade": ["Junior", "Mid"], "min_salary": [40000, 55000]})

# Find employees qualified for each grade (salary >= min_salary)
left_idx, right_idx = non_equi_nested_loop_join(
    employees, grades, "salary", "min_salary", "greater_than_or_equals"
)

Changes:

  • New files: non_equi_join.pyx, NonEquiJoinNode, comprehensive tests (7 cases), usage examples
  • Modified: operators module exports, physical planner routing, examples README
  • 613 lines added across 7 files

Fixes: #<issue_number_goes_here>

Please replace <issue_number_goes_here> with the corresponding issue number.


Thank you for contributing to Opteryx! 🎉

Original prompt

add in support for non equi joins.

this should use draken as the representation, using from_arrow if it gets arrow tables.

use nested loop as the core algorithm (not the existing implementation, this is optimized for equi joins)

support

not eq gt gt eq lt lt eq

as the first set of supported comparisons

we're not optimizing at this point, dumb and slow first.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot avatar Nov 15 '25 19:11 Copilot