datahub icon indicating copy to clipboard operation
datahub copied to clipboard

feat(ingestion): Add stateful ingestion to iceberg source

Open cccs-Dustin opened this issue 3 years ago • 5 comments

This PR adds the stateful ingestion feature/capability to the Iceberg source.

Checklist

  • [X] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [ ] Links to related issues (if applicable)
  • [X] Tests for the changes have been added/updated (if applicable)
  • [X] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

cccs-Dustin avatar Nov 02 '22 14:11 cccs-Dustin

@cccs-Dustin I see that this PR is still in draft - are you waiting for a review?

Also, we've made some simplifications to how stateful ingestion needs to be implemented. I'd recommend using this PR https://github.com/datahub-project/datahub/pull/6443 as an example. In particular, the IcebergCheckpointState can be simplified even more :)

hsheth2 avatar Nov 22 '22 04:11 hsheth2

Unit Test Results (metadata ingestion)

       8 files         8 suites   52m 55s :stopwatch:    755 tests    746 :heavy_check_mark: 2 :zzz: 2 :x: 5 :fire: 1 512 runs  1 500 :heavy_check_mark: 5 :zzz: 2 :x: 5 :fire:

For more details on these failures and errors, see this check.

Results for commit a91e3e0a.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Nov 22 '22 04:11 github-actions[bot]

Unit Test Results (build & test)

636 tests  ±0   632 :heavy_check_mark: ±0   15m 45s :stopwatch: -4s 164 suites ±0       4 :zzz: ±0  164 files   ±0       0 :x: ±0 

Results for commit a91e3e0a. ± Comparison against base commit 5b52534f.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Nov 22 '22 05:11 github-actions[bot]

@cccs-Dustin I see that this PR is still in draft - are you waiting for a review?

Also, we've made some simplifications to how stateful ingestion needs to be implemented. I'd recommend using this PR #6443 as an example. In particular, the IcebergCheckpointState can be simplified even more :)

Hi Harshal! The PR is currently a draft as we are testing the stateful feature locally. Once that testing is done and we are confident it works as intended I will open the PR back up and ask for a review :)

Thank you for mentioning the simplifications! I will go through the PR and apply the changes as per the PR you linked. Once that is done, would I be able to tag you on this PR to request a review (to make sure it is following all the new simplifications)?

cccs-Dustin avatar Nov 22 '22 12:11 cccs-Dustin

@cccs-Dustin absolutely - just @ me here and I'll take another look at when you're ready

hsheth2 avatar Nov 23 '22 22:11 hsheth2

Hi @hsheth2! I opened the PR as I have made the changes you requested, and I think it is now ready for review

cccs-Dustin avatar Dec 08 '22 15:12 cccs-Dustin