Only recompute schema in `TypeCoercion` when necessary
Draft as it builds on https://github.com/apache/datafusion/pull/10356
Which issue does this PR close?
closes https://github.com/apache/datafusion/issues/10365
Rationale for this change
recomputing the Schema can be expensive, so only do it when necessary
What changes are included in this PR?
- Update TypeCoercionRewriter to track when the expression is actually rewritten (via
Transformed::yes) - Only recompute the schema when an expression was rewritten
This is a big change change as it needs to is properly track when expressions were changed which required updating the entire expression rewrite
Are these changes tested?
Existing CI
Are there any user-facing changes?
No functional changes
TODO performance tests
There are a few tests that don't pass for reason's I haven't debugged yet but it looks like recomputing schemas may be masking issues elsewhere. Once https://github.com/apache/datafusion/pull/10356 is merged I'll do another profiling run to figure out if recomputing schema shows up in the traces and will keep hacking on this if so
Update here is I don't think I'll have a chance to work on this in the near term unless recomputing the schema shows up in planning benchmarks again.
I think given the level of effort involved, this approach is not likely to yield a large speedup (though I do think this PR's approach would be an improvement over the current code)
I realistically don't plan to spend any more time on this :(