datafusion
datafusion copied to clipboard
Support "A column is known to be entirely NULL" in `PruningPredicate`
Part of #9171
Rationale for this change
What changes are included in this PR?
- Add new method
PruningPredicate::row_counts()
to get the total row counts in each container. - Use the information from
PruningPredicate::row_counts()
andPruningPredicate::null_counts()
to determine containers where columns are entirely NULL. This is done by wrapping aCASE
expression around the pruning predicate:CASE WHEN x_null_count = x_row_count THEN false ELSE <current_pruning_predicate> END
Example 1
If a query has a predicate like:
x = 10
instead of
x_min <= 10 AND 10 <= x_max
to something like
CASE
WHEN x_null_count = x_row_count THEN false
ELSE x_min <= 10 AND 10 <= x_max
END
Example 2
Another more complicated example:
x < 5 AND x > 0 OR y = 10
instead of
x_max < 5 AND 0 < x_min OR (y_min <= 10 AND 10 <= y_max)
to something like
# x < 5
CASE
WHEN x_null_count = x_row_count THEN false
ELSE x_max < 5
END
AND
# x > 0
CASE
WHEN x_null_count = x_row_count THEN false
ELSE 0 < x_min
END
OR
# y = 10
CASE
WHEN y_null_count = y_row_count THEN false
ELSE y_min <= 10 AND 10 <= y_max
END
Are these changes tested?
Yes, updated and added more test coverage
Are there any user-facing changes?
Yes, there is a new API for PruningPredicate
called PruningPredicate::row_counts()