datahub
datahub copied to clipboard
feat(ingestion): Add stateful ingestion to iceberg source
This PR adds the stateful ingestion feature/capability to the Iceberg source.
Checklist
- [X] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [X] Tests for the changes have been added/updated (if applicable)
- [X] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
@cccs-Dustin I see that this PR is still in draft - are you waiting for a review?
Also, we've made some simplifications to how stateful ingestion needs to be implemented. I'd recommend using this PR https://github.com/datahub-project/datahub/pull/6443 as an example. In particular, the IcebergCheckpointState can be simplified even more :)
Unit Test Results (metadata ingestion)
8 files 8 suites 52m 55s :stopwatch: 755 tests 746 :heavy_check_mark: 2 :zzz: 2 :x: 5 :fire: 1 512 runs 1 500 :heavy_check_mark: 5 :zzz: 2 :x: 5 :fire:
For more details on these failures and errors, see this check.
Results for commit a91e3e0a.
:recycle: This comment has been updated with latest results.
Unit Test Results (build & test)
636 tests ±0 632 :heavy_check_mark: ±0 15m 45s :stopwatch: -4s 164 suites ±0 4 :zzz: ±0 164 files ±0 0 :x: ±0
Results for commit a91e3e0a. ± Comparison against base commit 5b52534f.
:recycle: This comment has been updated with latest results.
@cccs-Dustin I see that this PR is still in draft - are you waiting for a review?
Also, we've made some simplifications to how stateful ingestion needs to be implemented. I'd recommend using this PR #6443 as an example. In particular, the
IcebergCheckpointStatecan be simplified even more :)
Hi Harshal! The PR is currently a draft as we are testing the stateful feature locally. Once that testing is done and we are confident it works as intended I will open the PR back up and ask for a review :)
Thank you for mentioning the simplifications! I will go through the PR and apply the changes as per the PR you linked. Once that is done, would I be able to tag you on this PR to request a review (to make sure it is following all the new simplifications)?
@cccs-Dustin absolutely - just @ me here and I'll take another look at when you're ready
Hi @hsheth2! I opened the PR as I have made the changes you requested, and I think it is now ready for review