kedro-plugins kedro-datasets: Add support for Iceberg Tables

Description

Follow up from documenting Kedro + Iceberg in https://github.com/kedro-org/kedro/pull/4521 Add native support for Iceberg tables to kedro-datasets

Context

We don't have any datasets that support Iceberg tables, the documentation added in https://github.com/kedro-org/kedro/pull/4521 is fairly minimal and has limitations:

Only works for pandas dataframes
Works with pyiceberg behind the scenes, which doesn't support the full range of features you can leverage for Iceberg tables
Is custom implementation

I also want to get more feedback from the community about what level of support and features they would expect from this/these dataset/s. Also, would like to hear from users about how the currently use Iceberg tables with Kedro.

Possible Implementation

Extend the custom example from docs
Use other libraries in the backend
Spark + Iceberg dataset

Mar 07 '25 10:03 ankatiyar

I'd vote for Polars or Duckdb via Ibis

Mar 07 '25 12:03 datajoely

I'd recommend taking a look at https://github.com/dagster-io/community-integrations/tree/main/libraries/dagster-iceberg as a point of comparison (and potential starting point); Dagster I/O managers are fairly analogous to Kedro-Datasets in that they wrap a high-level load and save method, and @JasperHG90 did a great PyIceberg-based implementation. There's also a WIP Spark I/O manager, but that likely should be a Spark dataset, if anything.

It's worth noting that Kedro doesn't have as well-defined a concept of partitioning, so that may not translate without more work.

Mar 25 '25 16:03 deepyaman