datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion

Open jayzhan211 opened this issue 10 months ago • 11 comments

Which issue does this PR close?

Closes #10261 Closes #10241 Part of #10507

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jayzhan211 avatar Apr 27 '24 23:04 jayzhan211

Thanks @jayzhan211 - I will check this out tomorrow

alamb avatar Apr 29 '24 20:04 alamb

I am sorry I ran out of time to review this yesterday. I plan to do so later this morning

alamb avatar May 01 '24 13:05 alamb

I made a PR with just your wonderful tests in https://github.com/apache/datafusion/pull/10334 which I think will make this PR easier to evaluate

alamb avatar May 01 '24 19:05 alamb

Marking as draft as I think we are pursuing an alternate strategy, https://github.com/apache/datafusion/issues/10423, now

alamb avatar May 08 '24 17:05 alamb

In general I am concerned about the potential downstream effects of this change. I don't fully understand them

What I would ideally like to do is to run the influxdb_iox regression tests with this change and see what happens.

I am not sure I have the time to do that in the next week -- maybe @appletreeisyellow does 🤔

alamb avatar May 21 '24 20:05 alamb

Basically I worry this is just changing behavior rather than fixing a bug and will result in churn for no benefit downstream. I may be mis understanding the change and rationale however

alamb avatar May 21 '24 20:05 alamb

I believe the coercion rule is quite messy as it currently stands. It would be more understandable and maintainable to move the coercion rule from coerced_from to each individual function. This way, it would be clear which coercion rule applies to each function or signature. Having a single function handle multiple coercion rules makes the code harder to reason about and could cause downstream issues, as you have pointed out. I've also filed an issue to track this at @10507.

Would it be better to start by removing the rule from coerce_from and moving it to the specific function or signature?

jayzhan211 avatar May 21 '24 23:05 jayzhan211

I am not sure I have the time to do that in the next week -- maybe @appletreeisyellow does 🤔

@alamb I'm happy to coordinate with our performance team and run an influxdb_iox regression tests if needed

appletreeisyellow avatar May 22 '24 04:05 appletreeisyellow

I don't think this needs a performance test -- I was thinking just the main test suite

alamb avatar May 22 '24 15:05 alamb

I believe the coercion rule is quite messy as it currently stands. It would be more understandable and maintainable to move the coercion rule from coerced_from to each individual function. This way, it would be clear which coercion rule applies to each function or signature. Having a single function handle multiple coercion rules makes the code harder to reason about and could cause downstream issues, as you have pointed out. I've also filed an issue to track this at @10507.

Would it be better to start by removing the rule from coerce_from and moving it to the specific function or signature?

Hi @jayzhan211 -- I am sorry I haven't had a chance to devote more time to this and review. I will try and find some time in the next few days

alamb avatar May 22 '24 15:05 alamb

I don't think this needs a performance test -- I was thinking just the main test suite

I was thinking performance regression 😅 Got it! I can put up an Influxdata test against this branch, sometime this afternoon

appletreeisyellow avatar May 22 '24 17:05 appletreeisyellow

Looks good, and existing tests are passing 👍 🚀

alamb avatar May 26 '24 10:05 alamb

Thanks @alamb and @appletreeisyellow

jayzhan211 avatar May 26 '24 12:05 jayzhan211

Thank you for sticking with it @jayzhan211 -- so much is going on!@

alamb avatar May 27 '24 12:05 alamb