substrait
substrait copied to clipboard
Project relation: do we always need to just add columns?
Project relations are currently defined as adding columns to our output. Two Substrait implementations do not implement project relations this way (DuckDB and Datafusion) and the semantics of the plan still work if you keep this in mind (instead of using emit to remove the items you would add the columns you wish to keep instead). The two approaches have their advantages -- not copying allows you to just get the fields you want without needing to identify them explicitly in the output mapping and copying makes it easier to add an additional column.
I'm told that correcting the column handling in one engine will require a substantial amount of work (likely weeks) affecting the bookkeeping in every relation. Are there any potential alternatives such as additionally defining no copy project relation semantics (perhaps an option in common)? Or would this make the landscape more complicated for Substrait producers?