substrait
substrait copied to clipboard
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
Given that `Plan` already support multiple `Rel` (`repeated Rel relations`), it is very valuable to capture overlaps among them. This is valuable to reduce redundancy, as well as support multi-output...
Type variation anchor declarations identify a variation using only the extension it's defined in and its name: https://github.com/substrait-io/substrait/blob/d4cfbe014e9c126ac008094323a2baca9f47c42d/proto/substrait/extensions/extensions.proto#L45-L55 (note that the comment also says it's a type name rather than...
Currently, a Substrait plan produces a single output corresponding to its root relation. In some cases, multiple outputs are needed, e.g., for multi-table computation as well as for check-point and...
Currently, Ibis has [vectorized UDF nodes](https://github.com/ibis-project/ibis/blob/d7318fdf87121cd8fadbcf0369a2b217aab3053a/ibis/expr/operations/vectorized.py#L11-L50) that are not handled by Ibis-Substrait (see https://github.com/ibis-project/ibis-substrait/issues/236). This issue is specific for the protobuf changes needed.
This came up in yesterday's sync meeting: these options are copypasted all over the place but are not really documented anywhere. For one thing, I had misinterpreted the SILENT option....
1. What does `SORT_DIRECTION_CLUSTERED` mean? 2. Let's say I was sorting by strings, how would I specify a [natural sort](https://en.wikipedia.org/wiki/Natural_sort_order) vs alphabetical sort? 3. Can we provide an example of...
These nodes are [described](https://substrait.io/relations/physical_relations/#hash-aggregate-operation) in the physical relations section of the site. However, there is no corresponding proto definition. Some of the work done for hash/streaming equijoin may be useful...
In my (admittedly limited) experience it has been pretty rare that a dataset contains only data files and nothing else (e.g. metadata files, dataset descriptions, etc.) I know we have...
With reference to https://github.com/substrait-io/substrait/issues/138, we can have the implementation for CSV file format by defining the required messages. (Prototype code can be found [here](https://github.com/sanjibansg/substrait/blob/FileFormat/proto/substrait/algebra.proto#L117)) ``` message CSVConvertOptions{ bool ignore_check_utf8 =...
Based on [this slack thread](https://substrait.slack.com/archives/C02D7CTQXHD/p1641374144114700) there has been discussion about moving to a more explicit form of physical properties than the current, relatively implicit formulation. This could reduce behavior requirements...