polars
polars copied to clipboard
fix(python): Handle boolean comparisons in Iceberg predicate pushdown
Filtering Iceberg table boolean columns on boolean literals fails trying to parse pa.compute.scalar(True). This updates the Call predicate parser to handle the scalar type.
I originally posted about this in the Discord here which has a few more details: https://discord.com/channels/908022250106667068/1273409357685592159
table = pl.scan_iceberg(iceberg.load_table("codes.icd10cm"))
# is_header is a boolean column in the Iceberg table
# Both of these statements fail in the same way
table.filter(pl.col("is_header") == False).collect()
table.sql("SELECT count(*) from self where is_header = false").collect()
The error was
File .venv/lib/python3.12/site-packages/polars/lazyframe/frame.py:2027, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, engine, background, _eager, **_kwargs)
2025 # Only for testing purposes
2026 callback = _kwargs.get("post_opt_callback", callback)
-> 2027 return wrap_df(ldf.collect(callback))
ComputeError: ValueError: Could not convert predicate to PyIceberg: (pa.compute.field('is_header') == pa.compute.scalar(False))
which I traced back to the function that I've modified in this PR.
In addition to the parsing test I added here, it might also advisable to update the Iceberg table fixture as well and modify test_scan_iceberg_filter_on_column. But I'm not really sure what's the proper way to add a column to the fixture.
CI failures in some Delta table tests which seem unrelated.
@dimfeld can you rebase and make CI green?
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 79.89%. Comparing base (
d12131a) to head (69fc732). Report is 1124 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #18199 +/- ##
==========================================
- Coverage 79.89% 79.89% -0.01%
==========================================
Files 1495 1495
Lines 200214 200216 +2
Branches 2867 2868 +1
==========================================
- Hits 159966 159955 -11
- Misses 39702 39715 +13
Partials 546 546
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Looks like we're all good now. Thanks!
Wow, totally missed this one. Merging! :see_no_evil: