substrait icon indicating copy to clipboard operation
substrait copied to clipboard

What is the level of targeted portability of Substrait?

Open ingomueller-net opened this issue 5 months ago • 4 comments

I am trying to understand how rigorous Substrait aims to be with its specification and, hence, how portable it will eventually be. More concretely, do we expect exactly the same results independently of the backend that this runs on?

Note that in the SQL standard and most SQL dialects, this isn't the case at least with respect to the order of the results (including any unsorted intermediate result) even for different executions in the same system. If not, how exactly do we define the semantics and hence the set of "correct" results of a plan and how do we test implementations for compliance? On the issue of result order, the Substrait specification could leave the order undefined, like in the SQL standard, and one could probably come up with some acceptance tests that allow any valid execution.

However, there are other issues, such as the exact behavior of DECIMALs, which may result in different results unrelated to the order. (I just commented on ibis-project/ibis#8195, see details there.) If Substrait's specification said "the decimal operations do whatever the consumer does with decimals", then changing consumers might obviously result in a different query result. If it specified instead which exact bits need to be produced, it could be that some consumers need to emulate the specified behavior with more and/or more costly operations, which has its own downsides. What is Substrait's take here?

ingomueller-net avatar Feb 06 '24 14:02 ingomueller-net