polars
polars copied to clipboard
feat: Push down `is_between` expressions to Arrow
This allows PyIceberg to leverage the metadata to prune down data files, speeding up the queries significantly.
Resolves #15179
Thank you @Fokko. Can you add a test as well?
@ritchie46 For sure! Since I'm more of a Pythonista, could you point me to the right place to add a test?
Codecov Report
Attention: Patch coverage is 81.25000% with 3 lines in your changes are missing coverage. Please review.
Project coverage is 81.13%. Comparing base (
e3c2b0d) to head (fe72833). Report is 1 commits behind head on main.
:exclamation: Current head fe72833 differs from pull request most recent head bddb48c. Consider uploading reports for the commit bddb48c to get more accurate results
| Files | Patch % | Lines |
|---|---|---|
| crates/polars-plan/src/logical_plan/pyarrow.rs | 81.25% | 3 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #15180 +/- ##
=======================================
Coverage 81.13% 81.13%
=======================================
Files 1362 1362
Lines 174820 174836 +16
Branches 2531 2531
=======================================
+ Hits 141836 141854 +18
+ Misses 32500 32496 -4
- Partials 484 486 +2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
In this function we test several predicate pushdowns in pyarrow datasets: https://github.com/pola-rs/polars/blob/6a873242c84a81c7eecd394b330799a86a6d51d6/py-polars/tests/unit/io/test_pyarrow_dataset.py#L32
@ritchie46 @alexander-beedie All checks are green!
This allows PyIceberg to leverage the metadata to prune down data files, speeding up the queries significantly.
Nice one 👌