prql-query icon indicating copy to clipboard operation
prql-query copied to clipboard

Alternative Backends | Support open table projects like Apache Iceberg

Open ramkumarkb opened this issue 1 year ago • 6 comments

Hi,

First of all, thank you for the great project !

I was wondering if under "Alternative Backends" - can integrations with Open Table format like Apache Iceberg can be considered / added to the roadmap?

ramkumarkb avatar Dec 31 '22 10:12 ramkumarkb

I've checked-out the project but I don't understand how would we integrate with it.

As I understand, the project defines a data format, not a query language.

aljazerzen avatar Dec 31 '22 12:12 aljazerzen

@aljazerzen - Yes, indeed Iceberg defines the table data format and for now recommends using SQL engines like Apache Spark or Apache Flink to read / write data - as described in their Engine Support document

So one thought here woud be when PRQL integrates with Dataframes (as mentioned in the PRQL roadmap), then one of the (potential) candidates would be Spark Dataframe.

ramkumarkb avatar Jan 01 '23 11:01 ramkumarkb

I see.

So in terms of code, Iceberg has connectors that allow different engines to access tables from other engines/storage locations.

It doesn't specify anything about the query language and leaves that to your query engine. So if you are using anything that takes SQL, you should be able to PRQL with Iceberg :D

But I hear you, a guide on how to do this would be nice to have :D

Do you have a specific query engine in mind or are you asking just in general?

aljazerzen avatar Jan 01 '23 21:01 aljazerzen

My sense is that we've tightened our focus in this repo since that version of the Roadmap, and so will focus on the language here (updated roadmap is https://github.com/PRQL/prql/pull/1374), and leave the execution to other tools, possibly https://github.com/PRQL/prql-query

@snth shall we transfer this issue there?

max-sixty avatar Jan 01 '23 22:01 max-sixty

Sure, I am happy with that.

I am quite interested in Apache Iceberg myself and motivated to support it in pq.

snth avatar Jan 02 '23 10:01 snth

Hi @ramkumarkb ,

The next integration for prql-query will probably Polars. I would like to support Apache Iceberg and initially that will probably come through the DuckDB backend (who I believe are working on this) and possibly also Data Fusion.

snth avatar Jan 23 '23 21:01 snth