polars icon indicating copy to clipboard operation
polars copied to clipboard

fix(python): Handle boolean comparisons in Iceberg predicate pushdown

Open dimfeld opened this issue 1 year ago • 4 comments

Filtering Iceberg table boolean columns on boolean literals fails trying to parse pa.compute.scalar(True). This updates the Call predicate parser to handle the scalar type.

I originally posted about this in the Discord here which has a few more details: https://discord.com/channels/908022250106667068/1273409357685592159

table = pl.scan_iceberg(iceberg.load_table("codes.icd10cm"))

# is_header is a boolean column in the Iceberg table
# Both of these statements fail in the same way
table.filter(pl.col("is_header") == False).collect()
table.sql("SELECT count(*) from self where is_header = false").collect()

The error was

File .venv/lib/python3.12/site-packages/polars/lazyframe/frame.py:2027, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, engine, background, _eager, **_kwargs)
       2025 # Only for testing purposes
       2026 callback = _kwargs.get("post_opt_callback", callback)
    -> 2027 return wrap_df(ldf.collect(callback))
    
    ComputeError: ValueError: Could not convert predicate to PyIceberg: (pa.compute.field('is_header') == pa.compute.scalar(False))

which I traced back to the function that I've modified in this PR.

In addition to the parsing test I added here, it might also advisable to update the Iceberg table fixture as well and modify test_scan_iceberg_filter_on_column. But I'm not really sure what's the proper way to add a column to the fixture.

dimfeld avatar Aug 15 '24 00:08 dimfeld

CI failures in some Delta table tests which seem unrelated.

dimfeld avatar Aug 15 '24 00:08 dimfeld

@dimfeld can you rebase and make CI green?

ritchie46 avatar Aug 27 '24 07:08 ritchie46

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 79.89%. Comparing base (d12131a) to head (69fc732). Report is 1124 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18199      +/-   ##
==========================================
- Coverage   79.89%   79.89%   -0.01%     
==========================================
  Files        1495     1495              
  Lines      200214   200216       +2     
  Branches     2867     2868       +1     
==========================================
- Hits       159966   159955      -11     
- Misses      39702    39715      +13     
  Partials      546      546              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 27 '24 19:08 codecov[bot]

Looks like we're all good now. Thanks!

dimfeld avatar Aug 27 '24 19:08 dimfeld

Wow, totally missed this one. Merging! :see_no_evil:

ritchie46 avatar Jan 25 '25 12:01 ritchie46