polars icon indicating copy to clipboard operation
polars copied to clipboard

feat: Push down `is_between` expressions to Arrow

Open Fokko opened this issue 1 year ago • 4 comments

This allows PyIceberg to leverage the metadata to prune down data files, speeding up the queries significantly.

Resolves #15179

Fokko avatar Mar 20 '24 08:03 Fokko

Thank you @Fokko. Can you add a test as well?

ritchie46 avatar Mar 20 '24 09:03 ritchie46

@ritchie46 For sure! Since I'm more of a Pythonista, could you point me to the right place to add a test?

Fokko avatar Mar 20 '24 09:03 Fokko

Codecov Report

Attention: Patch coverage is 81.25000% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 81.13%. Comparing base (e3c2b0d) to head (fe72833). Report is 1 commits behind head on main.

:exclamation: Current head fe72833 differs from pull request most recent head bddb48c. Consider uploading reports for the commit bddb48c to get more accurate results

Files Patch % Lines
crates/polars-plan/src/logical_plan/pyarrow.rs 81.25% 3 Missing :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15180   +/-   ##
=======================================
  Coverage   81.13%   81.13%           
=======================================
  Files        1362     1362           
  Lines      174820   174836   +16     
  Branches     2531     2531           
=======================================
+ Hits       141836   141854   +18     
+ Misses      32500    32496    -4     
- Partials      484      486    +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 20 '24 10:03 codecov[bot]

In this function we test several predicate pushdowns in pyarrow datasets: https://github.com/pola-rs/polars/blob/6a873242c84a81c7eecd394b330799a86a6d51d6/py-polars/tests/unit/io/test_pyarrow_dataset.py#L32

ritchie46 avatar Mar 20 '24 10:03 ritchie46

@ritchie46 @alexander-beedie All checks are green!

Fokko avatar Apr 04 '24 18:04 Fokko

This allows PyIceberg to leverage the metadata to prune down data files, speeding up the queries significantly.

Nice one 👌

alexander-beedie avatar Apr 13 '24 07:04 alexander-beedie