datafusion Add hooks to `SchemaAdapter` to add custom column generators

Closes #15220

A lot of the work of this PR is meant to resolve https://github.com/apache/datafusion/issues/15220#issuecomment-2727534085. I think I'll move that into a standalone PR.

Mar 16 '25 20:03 adriangb

I've moved the complex bit over to https://github.com/apache/datafusion/pull/15263. I'll let that settle first then resume work here.

Mar 17 '25 01:03 adriangb

Noting that in https://github.com/apache/datafusion/pull/15263#discussion_r1997816085 I realized that it might be good to have a system to report stats for columns that will be generated before they are generated (is it all nulls? is it a constant?) to be used with stats pruning.

Mar 17 '25 01:03 adriangb

Now that https://github.com/apache/datafusion/pull/15263 is merged I'll come back here and:

Resolve conflicts.
Add an API for the SchemaAdapter to declare the stats for potentially generated columns if they are known ahead of time.

Mar 19 '25 20:03 adriangb

Marking as ready for review. The main TODO is an API for transmitting statistics information for generated columns before they get generated, but that can even be a followup PR.

Mar 20 '25 17:03 adriangb

Looking at how filter pushdown interacts with partition columns I think this could improve that. Currently the partition values get bound when the FileStream is created which is after the predicate pushdown is applied. The filtering for filters that depend both on the partition values and data happens via a FilterExec. This means that partition values are not available in predicate pushdown, and instead happens upstream in a FilterExec.

I feel like this change could help with that... but some details are missing: we somehow need to pipe the partition values into the FileSource so that it can in turn pass in the info to generate the partition columns on the fly if needed. Or something like that...

Apr 30 '25 19:04 adriangb

Marking as draft until I have time to work on this

Jun 27 '25 18:06 adriangb

I'm proposing we replace SchemaAdapter in https://github.com/apache/datafusion/issues/16800 so I don't plan to work on this PR anymore

Jul 20 '25 17:07 adriangb