ArcticDB
ArcticDB copied to clipboard
Conversion from pyarrow `Expression`s to `QueryBuilder` expressions
Reference Issues/PRs
Monday ref: 8471524478
What does this implement or fix?
Introduces ExpressionNode._from_pyarrow_expression_str.
This is required for predicate pushdown integration with polars. E.g. when we do:
lf = polars.scan_arcticdb(arctic_identifier)
lf = lf.filter(pl.col("float_col") <= 40.1)
lf.collect()
When calling collect the filter will get pushed down to our (to be implemented) polars.scan_arcticdb via a callback. The filter is given as a string which can be evaluated to a pyarrow expression. For reference string construction happens here.
I've decided to keep this conversion code in core arcticdb instead of polars for a few reasons:
- Gives us more flexibility if we decide to update our
QueryBuilderexpressions - We'll have less code to maintain in a repository which is not ours
- Automated tests allow us to not accidentally break polars predicate pushdown to arcticdb
For additional reference see how pyiceberg handles predicate pushdown from polars here
And how this might fit in the prototype polars.scan_arcticdb implementation here
Change Type (Required)
- [x] Patch (Bug fix or non-breaking improvement)
- [ ] Minor (New feature, but backward compatible)
- [ ] Major (Breaking changes)
- [ ] Cherry pick
Any other comments?
Checklist
Checklist for code changes...
- [ ] Have you updated the relevant docstrings, documentation and copyright notice?
- [ ] Is this contribution tested against all ArcticDB's features?
- [ ] Do all exceptions introduced raise appropriate error messages?
- [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?