substrait icon indicating copy to clipboard operation
substrait copied to clipboard

Where can we find examples of serializing a view as a plan?

Open mchades opened this issue 1 year ago • 3 comments

The example use case in homepage has this interesting line:

Serialize a plan that represents a SQL view for consistent use in multiple systems (e.g. Iceberg views in Spark and Trino)

It's a really awesome example, but I can't find any relative code of Substrait in Iceberg, Spark, and Trino.

Did I miss something?

BTW, I assume that it will be more friendly to attach the example link to the use cases on the homepage

mchades avatar Nov 28 '24 13:11 mchades

I think that these examples are "examples of potential future uses."

Interestingly enough, there has been a discussion on the Iceberg mailing list in the last few weeks to make exactly that envisioned use case a reality.

ingomueller-net avatar Dec 13 '24 08:12 ingomueller-net

Gluten (a Spark plugin) has modified Substrait to read Iceberg files. That modification on my list to mainstream these changes at some point:

https://github.com/apache/incubator-gluten/blob/main/gluten-substrait/src/main/resources/substrait/proto/substrait/algebra.proto#L152

EpsilonPrime avatar Dec 13 '24 09:12 EpsilonPrime

More generally speaking, there is no currently existing example that is interesting. To make an interesting one depends on a database having an interesting way of querying a view.

I threw together a simple example using ibis and duckdb here: query-duckdb-view

Representing a query of a view can happen a variety of ways: ReadRel and ExtensionLeafRel are 2 specific operators, but even ReadRel specifies a handful of particular approaches via the oneof read_type group of attributes. The provided example just uses ReadRel.named_table (I think).

Then, various systems will likely present views in different ways, though I assume many will resolve it at the catalog level: a "table name" that matches a view name will read from the view and be otherwise transparent.

Altogether, a logical example would be:

  1. Produce a substrait plan that specifies the name of a view in either a ReadRel or an ExtensionLeafRel.
  2. When consuming the substrait plan, either:
    • resolve the view directly (if the plan explicitly mentions a view name)
    • resolve the view indirectly (e.g. if the plan specifies the view via ReadRel.named_table)
  3. The query completes per usual.

How a producer does (1) and how a consumer does (2) is where you'd get a variety of interesting examples (maybe). If there's some particular examples you'd like then maybe you can propose them? I don't use iceberg, spark, or trino, so I don't have an environment in which I can produce examples.

drin avatar Dec 13 '24 22:12 drin