Support bounds evaluation for temporal data types
Which issue does this PR close?
As discussed in 14237 temporal data types should be supported in bounds evaluation.
- Closes #14237.
Rationale for this change
We want to extend the number of expressions that can evaluate bounds to calculate statistics and enable better/more query plan optimisations.
What changes are included in this PR?
- Additional data types are supported:
Timestamp,Date,Time,Duration,Interval - Additional binary operators are supported: ~
NotEq~,IsDistinctFrom,IsNotDistinctFrom - Whether an expression supports bounds evaluation or not is now part of the
PhysicalExprtrait and therefore decided by every expression type itself. This will also enable UDFs to implement bounds evaluation.
Are these changes tested?
yes
Are there any user-facing changes?
PhysicalExpr and ScalarUDFImpl change, but both provide default implementations for additional functions.
Thank you @ch-sc for working on this. When you need a review, I can do that if you ping me.
Thanks @berkaysynnada! Yeah I'd like to get a review.
There is still a bug, though, that I don't understand yet. Sometimes the sort operator gets removed as can be seen in the repartition_scan.slt test.
Sometimes the sort operator gets removed as can be seen in the
repartition_scan.slttest.
Seems like you're losing the order requirement somehow
@berkaysynnada do you have time to take another look? :)
NotEq leads to the removal of the sort operator.
I debugged into this and noticed that the EnforceSorting optimiser removes SortExec. enforce_sorting/mod.rs:415
However, prior to that the order requirement somehow is lost and I don't understand why that happens.
I removed NotEq again from the supported binary operators to move this forward. We might want to look into this in a follow-up PR. WDYT?
@berkaysynnada do you have time to take another look? :)
Thanks again for driving this forward :) I'm a bit busy these days, but I'd love to go over it. If it's not urgent for you, would you mind waiting until the weekend?
I removed
NotEqagain from the supported binary operators to move this forward. We might want to look into this in a follow-up PR. WDYT?
Thanks for the heads-up. I try to add support for NotEq. If I find a solution, I can fix it within this PR itself—if that's okay with you.
I took a quick look, and you can use the union logic there: https://github.com/Fly-Style/datafusion/blob/fee3023a63227f0a22ac2da1d040d373cc028c34/datafusion/expr-common/src/interval_arithmetic.rs#L667 It seems to follow a better style, IMO.
I try to add support for NotEq. If I find a solution, I can fix it within this PR itself—if that's okay with you.
Sure, feel free to make any adjustment as you see fit.
I took a quick look, and you can use the union logic there
I thought a bit more about the union logic and came up with another solution.
Sometimes intervals are not overlapping and sometimes have no bound in one direction (NULL). Typically the union of non-overlapping intervals would result in a set of intervals, but there is no support of this.
Here are some examples of what I think should be correct, without interval sets support:
(1, 2) ∪ (3,4) = (1,4)
(1, NULL) ∪ (NULL, NULL) = (1, NULL)
(NULL, 1) ∪ (NULL, NULL) = (NULL, 1)
(3, NULL) ∪ (NULL, 1) = (NULL, NULL)
(5, NULL) ∪ (1, 2) = (1, NULL)
I adapted the union logic accordingly and added tests. Happy for your feedback!
Looks like I confused the meaning of nulls in intervals. I switched to the union logic @berkaysynnada suggested.
@berkaysynnada thank you for your review 🙂
Rather than introducing new supports_() API's, should we force the users to infer the support by the evaluate API?
I agree, something like that would be ideal. I'll look into it today
Hi @ch-sc. Do you plan to complete the work here? (as it's so almost)
Hi @berkaysynnada, I should be able to spend some time on this at the end of this week.
Hi @berkaysynnada, can you take another look to move this forward? :)
Hi @berkaysynnada, can you take another look to move this forward? :)
I will do but some tests are failing. I'll have some time probably in the evening or tomorrow morning
Hi @ch-sc. Reviewing while tests are failing is not very efficient. Are you dealing with them?
Sorry @berkaysynnada, I got side-tracked from this yesterday. The test is fixed.
Thank you @ch-sc. I will review it today
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.