datafusion
datafusion copied to clipboard
Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion
Which issue does this PR close?
Closes #10261 Closes #10241 Part of #10507
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
Thanks @jayzhan211 - I will check this out tomorrow
I am sorry I ran out of time to review this yesterday. I plan to do so later this morning
I made a PR with just your wonderful tests in https://github.com/apache/datafusion/pull/10334 which I think will make this PR easier to evaluate
Marking as draft as I think we are pursuing an alternate strategy, https://github.com/apache/datafusion/issues/10423, now
In general I am concerned about the potential downstream effects of this change. I don't fully understand them
What I would ideally like to do is to run the influxdb_iox regression tests with this change and see what happens.
I am not sure I have the time to do that in the next week -- maybe @appletreeisyellow does 🤔
Basically I worry this is just changing behavior rather than fixing a bug and will result in churn for no benefit downstream. I may be mis understanding the change and rationale however
I believe the coercion rule is quite messy as it currently stands. It would be more understandable and maintainable to move the coercion rule from coerced_from to each individual function. This way, it would be clear which coercion rule applies to each function or signature. Having a single function handle multiple coercion rules makes the code harder to reason about and could cause downstream issues, as you have pointed out. I've also filed an issue to track this at @10507.
Would it be better to start by removing the rule from coerce_from and moving it to the specific function or signature?
I am not sure I have the time to do that in the next week -- maybe @appletreeisyellow does 🤔
@alamb I'm happy to coordinate with our performance team and run an influxdb_iox regression tests if needed
I don't think this needs a performance test -- I was thinking just the main test suite
I believe the coercion rule is quite messy as it currently stands. It would be more understandable and maintainable to move the coercion rule from coerced_from to each individual function. This way, it would be clear which coercion rule applies to each function or signature. Having a single function handle multiple coercion rules makes the code harder to reason about and could cause downstream issues, as you have pointed out. I've also filed an issue to track this at @10507.
Would it be better to start by removing the rule from coerce_from and moving it to the specific function or signature?
Hi @jayzhan211 -- I am sorry I haven't had a chance to devote more time to this and review. I will try and find some time in the next few days
I don't think this needs a performance test -- I was thinking just the main test suite
I was thinking performance regression 😅 Got it! I can put up an Influxdata test against this branch, sometime this afternoon
Looks good, and existing tests are passing 👍 🚀
Thanks @alamb and @appletreeisyellow
Thank you for sticking with it @jayzhan211 -- so much is going on!@