glaredb icon indicating copy to clipboard operation
glaredb copied to clipboard

Support Reading Substrait plans

Open universalmind303 opened this issue 2 years ago • 4 comments

Description

supporting substrait plans opens up a lot of possibilities of easily integrating with external tools (such as polars).

universalmind303 avatar Oct 11 '23 17:10 universalmind303

It would be great if this were supported, as it seems that GlareDB is already compiled with datafusion-substrait built in.

eitsupi avatar May 26 '24 14:05 eitsupi

@eitsupi what did you have in mind for using this? We never prioritized this because we didn't have a use case or a thing that could generate substrait plans that we were aware of people using.

I definitely think that it'd be cool, but I'd like to collect some potential use cases and tools that we could integrate with before working on this directly.

tychoish avatar May 28 '24 21:05 tychoish

Fair point. apache/arrow#37504 is what I would like to see implemented.

There are dedicated integrations for the Apache Arrow and DuckDB packages in Python and R^1, but it would be great if this could be generalized to perform pushdown between the various packages. Of course it will take a while for this to happen, so I don't know if it would be worth implementing right away.

eitsupi avatar May 29 '24 02:05 eitsupi

Some of this sounds a lot like ADBC.

Also, to be clear, parts of this makes a lot of sense, particularly being able to take data formats and objects from the integration packages and pass the results of a GlareDB query to a DuckDB query (say). It should be possible to convert the arrow batches in such a way that this would work pretty well (or well enough,) and is something that could be extended, would not require a lot of work, and makes sense.

Pushing predicates down, between packages makes slightly less sense, or at least in most cases you'd want the query engine to connect to (and track push downs) the underlying datasource, rather than doing the translation this in the bindings.

I can imagine specific integration between query engines relying on substrait or another protocol, and if we were to implement that, using substrait when it was appropriate, could be great (and that can be easily accomplished, but I don't know (given the existing datafusion support for substrait), if there is anything particularly generic. to be done.

tychoish avatar Jun 11 '24 19:06 tychoish