duckdb_iceberg icon indicating copy to clipboard operation
duckdb_iceberg copied to clipboard

Iceberg REST Catalog Support

Open randypitcherii opened this issue 2 years ago • 15 comments
trafficstars

Hey, team!

Very excited about the duckdb v0.9 support for iceberg!

I currently use a rest catalog for my iceberg tables and was hoping to be able to wire up duckdb to that rather than point it to the actual underlying data/metadata files.

If this is available, I'd love to use it -- otherwise, I'd be happy to jump in and start coding if this feature is new.

Thanks!

randypitcherii avatar Sep 26 '23 14:09 randypitcherii

Hi @randypitcherii

Thanks for your interest! The iceberg extension is currently in a quite early stage. The REST catalog is not yet supported, so we are definitely interested in your help there! Feel free to reach out to me through the DuckDB discord for a chat!

samansmink avatar Sep 26 '23 14:09 samansmink

Ok, no worries.

I'm thinking I'll chat with the rest catalog through python then get the details to my 🦆 db programatically.

I'll see you on the discord!!! Thanks!

randypitcherii avatar Sep 26 '23 14:09 randypitcherii

Good morning @samansmink , is there any plan to support iceberg catalogs in general (not only REST) in the near future?

Thanks in advance.

thinkORo avatar Mar 27 '24 06:03 thinkORo

Hey @thinkORo! I would love to, but I'm a bit low on time currently. In general i would say we would like to support the most used catalogs at some point, but I can not give any timeline here at the moment. If you are interested in contributing, I'm happy to help out though

samansmink avatar Mar 27 '24 09:03 samansmink

Hi @samansmink ,

Unfortunately, I'm only really good at Data Management and Data Analytics. And Python. Therefore, I am only a very limited support in contributing to DuckDB.

But: If I can do something to increase the prioritization or support you elsewhere to give you more time for such an (really important, at least for me) implementation, I am happy to do so.

thinkORo avatar Mar 27 '24 10:03 thinkORo

I have a framework in place for this if #51 gets merged, see the notes about the REST/Nessie catalog.

It should just be a few more lines of work to perform the HTTP request.

rustyconover avatar Apr 11 '24 01:04 rustyconover

Up! Any updates on this?

astronautas avatar May 30 '24 21:05 astronautas

Not yet -- been working on other things but will return to this soon.

rustyconover avatar May 31 '24 03:05 rustyconover

please update once it is implemented. Really excited to see duckdb support to REST catalog in iceberg

arnabneogi86 avatar Jun 20 '24 14:06 arnabneogi86

While the combination of DuckDB <> PyArrow <> PyIceberg support covers this use-case, the extension is much more efficient than loading the data into PyTable. I would love to see the support for Iceberg catalogs.

buremba avatar Aug 01 '24 00:08 buremba

This integration is super exciting. Any updates on when we might expect it to be available? Looking forward to trying it out.

prasanthkn83 avatar Nov 07 '24 13:11 prasanthkn83

This should be resolved now, correct?

  • https://github.com/duckdb/duckdb-iceberg/pull/98

derekperkins avatar Feb 26 '25 04:02 derekperkins

@derekperkins, we'll see if it will be included in the next release. But #98 sounds very, hmm, suitable to me. But it could be that @randypitcherii has write access in mind as well.

thinkORo avatar Feb 26 '25 13:02 thinkORo

Hey this is great news!

Write would be lovely, but I personally just wanted read access through an iceberg catalog.

ESPECIALLY valuable would be if duck worked with the catalog to push down predicates to only pull back data required for a given query.

Does this help?

randypitcherii avatar Feb 26 '25 16:02 randypitcherii

Redpanda would be interested in WRITE support to a DuckDB REST catalog as well

mattschumpert avatar Apr 18 '25 20:04 mattschumpert