datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled

Open itsjunetime opened this issue 1 year ago • 0 comments

Which issue does this PR close?

I think this should close #4028

Rationale for this change

As far as I can tell, this follows the recommendations stated in #4028. The only part that I'm confused about is specifically the last requirement for this API: "The predicate is fully pushed down by ParquetExec (not all predicates are supported)".

To fulfill this requirement, this PR pulls in code that used to reside in FilterCandidateBuilder that recurses through an expr and checks if any column:

  • Is not contained inside the provided schemas (as it must then be project), OR
  • Contains a nested datatype, such as a struct or a list.

If any columns in the given expression fulfill either of these two requirements, we assume that it can't be pushed down, and return Inexact for this API (as well as if we're not using parquet or if pushdown is not enabled, of course).

I'm worried about this a little bit because I was under the impression that there were more complex requirements which defined whether or not an expression could be pushed down (specifically with regard to more complex expressions - e.g. AND/OR exprs have more complex rules that single-column exprs don't have to worry about) but those rules don't seem to be reflected in the code that I could find, so I might be being a bit paranoid. I'm not exactly certain and would appreciate some feedback on that specifically :)

What changes are included in this PR?

Code is mostly just moved around to facilitate the changes described above.

Are these changes tested?

Yes - I've added a few tests to make sure all the changes I'm aware of are now tested.

Are there any user-facing changes?

No API changes, but I think the affected ListingTable API is public-facing, so this will be a slight behavioral change.

itsjunetime avatar Aug 23 '24 18:08 itsjunetime