datahub
datahub copied to clipboard
feat(ingestion): adds the optional instance field if given when producing the container entities
This PR is adding the instance
field if given to the DataPlatformInstance
aspect when producing container entities.
Checklist
- [x] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [x] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
Unit Test Results (metadata ingestion)
5 files 5 suites 1h 29m 38s :stopwatch: 555 tests 552 :heavy_check_mark: 3 :zzz: 0 :x: 2 552 runs 2 477 :heavy_check_mark: 75 :zzz: 0 :x:
Results for commit 60342467.
:recycle: This comment has been updated with latest results.
@treff7es you were working on some stuff related to this recently, so sharing with you
Hi, any update? /cc: @treff7es @jjoyce0510
Hi, any update? /cc: @treff7es @jjoyce0510
I just checked and basically what we do is to generate PlatformInstance even if the platform is not set in the config or not supported by the source. I need to ask around if this generalization makes sense for all the sources because afaik there are sources where the PlatformInstance does not make sense.
Unless I'm missing something 😅 ...
I just checked and basically what we do is to generate PlatformInstance even if the platform is not set in the config or not supported by the source.
If the platform is not set in the config or not supported by the source, the PlatformInstance aspect for the container upsert event was already generated but only with the platform
detail. This PR just adds the instance
detail for those cases where the platform instance is supported by the source and set in the config.
I need to ask around if this generalization makes sense for all the sources because afaik there are sources where the PlatformInstance does not make sense
In those cases, the PlatformInstance aspect for the container upsert will include only the platform
not the instance (because in container_key.instance
should be null
Basically, what I want to solve in this PR is the following misalignment:
- datasets exist in the context of a platform instance
- at the same time, those datasets are defined in the context of containers
- however, those containers do not hold any reference to the parent platform instance (well, actually they do, in the encoded container id 😅 ) Instead the PR tries to make platform instance info easily accessible for container entities.
Unit Test Results (build & test)
381 tests 381 :heavy_check_mark: 3m 4s :stopwatch: 89 suites 0 :zzz: 89 files 0 :x:
Results for commit 60342467.
:recycle: This comment has been updated with latest results.
Hi, is there any update on this? /cc: @treff7es @jjoyce0510
I could split the PR in two, if that helps to move this forward somehow:
- on the one side, align the behaviour of the kafka connector with the kafka connect one, as pointed out here https://github.com/datahub-project/datahub/pull/4665#discussion_r849822074
- on the other, adding the platform instance detail for containers to fix the misalignment pointed out here https://github.com/datahub-project/datahub/pull/4665#issuecomment-1105688384
Fixing conflicts is just postponed 😅 until we have confirmation these updates make sense to you too.
Looks like there are unfortunately a few merge conflicts here. Let's find some time to make this work and get this one in!
Just being a little bit lazy and wanted to have some confirmation from you about this PR before spending some time on fixing them 😅
Thanks! I will fix conflicts... on the next days 😄
@sgomezvillamor I'm working on https://github.com/datahub-project/datahub/pull/6083, which should accomplish the end goal as this PR while maintaining backwards compat a bit better.
Superseded by https://github.com/datahub-project/datahub/pull/6083