ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

Conversion from pyarrow `Expression`s to `QueryBuilder` expressions

Open IvoDD opened this issue 9 months ago • 0 comments

Reference Issues/PRs

Monday ref: 8471524478

What does this implement or fix?

Introduces ExpressionNode._from_pyarrow_expression_str.

This is required for predicate pushdown integration with polars. E.g. when we do:

lf = polars.scan_arcticdb(arctic_identifier)
lf = lf.filter(pl.col("float_col") <= 40.1)
lf.collect()

When calling collect the filter will get pushed down to our (to be implemented) polars.scan_arcticdb via a callback. The filter is given as a string which can be evaluated to a pyarrow expression. For reference string construction happens here.

I've decided to keep this conversion code in core arcticdb instead of polars for a few reasons:

  • Gives us more flexibility if we decide to update our QueryBuilder expressions
  • We'll have less code to maintain in a repository which is not ours
  • Automated tests allow us to not accidentally break polars predicate pushdown to arcticdb

For additional reference see how pyiceberg handles predicate pushdown from polars here

And how this might fit in the prototype polars.scan_arcticdb implementation here

Change Type (Required)

  • [x] Patch (Bug fix or non-breaking improvement)
  • [ ] Minor (New feature, but backward compatible)
  • [ ] Major (Breaking changes)
  • [ ] Cherry pick

Any other comments?

Checklist

Checklist for code changes...
  • [ ] Have you updated the relevant docstrings, documentation and copyright notice?
  • [ ] Is this contribution tested against all ArcticDB's features?
  • [ ] Do all exceptions introduced raise appropriate error messages?
  • [ ] Are API changes highlighted in the PR description?
  • [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

IvoDD avatar Feb 25 '25 16:02 IvoDD