dlt
dlt copied to clipboard
Feat/iceberg advanced partitioning
Hey Team,
I’ve been using dlt for the past 3–4 months, mostly with Apache Iceberg as the destination. Recently, I needed support for Iceberg partitioning, especially for more advanced use cases like time and bucket partitions.
I’ve implemented support for these in a way that’s fully compatible with existing column-level partition configurations: Still works with earlier formats like:
{ "region": { "partition": true }, "category": { "partition": true } }
Now also supports advanced options like:
{ "date_added": { "partition": { "type": "year", "index": 1, "name": "yearly_partition" } }, "user_id": { "partition": { "type": "bucket", "index": 2, "bucket_count": 32, "name": "user_bucket" } }, "region": { "partition": { "type": "identity", "index": 3 } } }
Would love feedback from the team!
Deploy Preview for dlt-hub-docs canceled.
| Name | Link |
|---|---|
| Latest commit | 5916083f1a6abd6fe6aeedf90720c998dafa1b60 |
| Latest deploy log | https://app.netlify.com/projects/dlt-hub-docs/deploys/68b70681267a61000844a333 |
Hi @rakesh-tmdc, thanks for the contribution, this looks good and useful. In dlt+ we already have an iceberg_adapter() with iceberg_partition helpers for these transforms. We're open to moving this adapter module to open source dlt so your PR can reuse it and stay fully compatible with our existing semantics/docs.
If you're up for it, we can extract the adapter and have your changes delegate partition spec parsing/validation to it to keep behavior consistent across catalogs.
Thanks @burnash , glad to hear this is useful! Extracting the iceberg_adapter and its partition helpers into open source sounds like a great idea — I’d definitely prefer to reuse that instead of duplicating logic.
Once it’s available, I can rework my PR so that partition spec parsing/validation delegates to the adapter, which should keep things consistent. Just let me know when/where the adapter lands, and I’ll update accordingly.
Hi @rakesh-tmdc,
I've ported the iceberg_adapter() and iceberg_partition helpers from dlt+ to dlt core. These are now available in dlt/destinations/impl/filesystem/iceberg_adapter.py and provide the canonical way in dlt to configure Iceberg partitioning going forward.
Now that we have the adapter in place, here's what needs to happen next to complete this PR:
- Remove the old implementation from
dlt/common/libs/pyiceberg.py. The following code should be deleted as it's now superseded by the iceberg_adapter:PartitionTypeenum and oldPartitionSpecdataclass;IcebergPartitionManager;_validate_partition_spec(),_validate_and_fix_indices(),extract_partition_specs_from_schema()functions - Reset the
dlt/common/schema/typing.pyback toOptional[bool] - Update the tests in
tests/common/libs/test_pyiceberg.pyso they don't test removed functions and ensure the tests test the new adapter-based approach. You can see the example oficeberg_adapter()andiceberg_partitionusage in theiceberg_adapter()docstring.
You can also test the backward compatibility with older way to define identity partition by running:
TESTS__BUCKET_URL_FILE="_storage/data" pytest tests/load/pipeline/test_open_table_pipeline.py::test_table_format_partitioning -k "iceberg"
where _storage/data is a path to a local folder for the Iceberg files.
Let me know if you have any questions!
Hi @rakesh-tmdc, I wanted to check in and see if you're still interested in continuing with this PR? If you'd like to move forward or you're currently busy or have moved on to other priorities, that's completely understandable: I can take over from here.
Either way, thanks for the contribution and for kicking off this feature.
Hi @rakesh-tmdc,
Just checking in again. We'd really like to get this PR merged and we now have the iceberg_adapter API in place.
Since this PR has been quiet for a bit, we’ll go ahead and remove the old implementation in pyiceberg.py and adapt the tests to the new adapter-based API. If you'd still like to finish it yourself, just shout and we can coordinate. Thanks again for starting this work!
Deploying with
Cloudflare Workers
The latest updates on your project. Learn more about integrating Git with Workers.
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | 093ad285 | Commit Preview URL Branch Preview URL |
Nov 28 2025, 11:52 PM |
Hi @burnash — sorry for the delayed response! I missed some of the recent updates. If you’d still like me to continue working on this PR, I’m happy to pick it back up and follow through with the remaining changes.
Thanks again for all the work you’ve put into the adapter and tests!