datahub icon indicating copy to clipboard operation
datahub copied to clipboard

feat(ingestion): adds the optional instance field if given when producing the container entities

Open sgomezvillamor opened this issue 2 years ago • 9 comments

This PR is adding the instance field if given to the DataPlatformInstance aspect when producing container entities.

Checklist

  • [x] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [ ] Links to related issues (if applicable)
  • [x] Tests for the changes have been added/updated (if applicable)
  • [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

sgomezvillamor avatar Apr 13 '22 19:04 sgomezvillamor

Unit Test Results (metadata ingestion)

       5 files         5 suites   1h 29m 38s :stopwatch:    555 tests    552 :heavy_check_mark:   3 :zzz: 0 :x: 2 552 runs  2 477 :heavy_check_mark: 75 :zzz: 0 :x:

Results for commit 60342467.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Apr 13 '22 20:04 github-actions[bot]

@treff7es you were working on some stuff related to this recently, so sharing with you

sgomezvillamor avatar Apr 14 '22 11:04 sgomezvillamor

Hi, any update? /cc: @treff7es @jjoyce0510

sgomezvillamor avatar Apr 21 '22 07:04 sgomezvillamor

Hi, any update? /cc: @treff7es @jjoyce0510

I just checked and basically what we do is to generate PlatformInstance even if the platform is not set in the config or not supported by the source. I need to ask around if this generalization makes sense for all the sources because afaik there are sources where the PlatformInstance does not make sense.

treff7es avatar Apr 21 '22 10:04 treff7es

Unless I'm missing something 😅 ...

I just checked and basically what we do is to generate PlatformInstance even if the platform is not set in the config or not supported by the source.

If the platform is not set in the config or not supported by the source, the PlatformInstance aspect for the container upsert event was already generated but only with the platform detail. This PR just adds the instance detail for those cases where the platform instance is supported by the source and set in the config.

I need to ask around if this generalization makes sense for all the sources because afaik there are sources where the PlatformInstance does not make sense

In those cases, the PlatformInstance aspect for the container upsert will include only the platform not the instance (because in container_key.instance should be null

Basically, what I want to solve in this PR is the following misalignment:

  • datasets exist in the context of a platform instance
  • at the same time, those datasets are defined in the context of containers
  • however, those containers do not hold any reference to the parent platform instance (well, actually they do, in the encoded container id 😅 ) Instead the PR tries to make platform instance info easily accessible for container entities.

sgomezvillamor avatar Apr 21 '22 19:04 sgomezvillamor

Unit Test Results (build & test)

381 tests   381 :heavy_check_mark:  3m 4s :stopwatch:   89 suites      0 :zzz:   89 files        0 :x:

Results for commit 60342467.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Apr 30 '22 09:04 github-actions[bot]

Hi, is there any update on this? /cc: @treff7es @jjoyce0510

I could split the PR in two, if that helps to move this forward somehow:

  • on the one side, align the behaviour of the kafka connector with the kafka connect one, as pointed out here https://github.com/datahub-project/datahub/pull/4665#discussion_r849822074
  • on the other, adding the platform instance detail for containers to fix the misalignment pointed out here https://github.com/datahub-project/datahub/pull/4665#issuecomment-1105688384

Fixing conflicts is just postponed 😅 until we have confirmation these updates make sense to you too.

sgomezvillamor avatar Jun 02 '22 08:06 sgomezvillamor

Looks like there are unfortunately a few merge conflicts here. Let's find some time to make this work and get this one in!

jjoyce0510 avatar Jun 06 '22 15:06 jjoyce0510

Just being a little bit lazy and wanted to have some confirmation from you about this PR before spending some time on fixing them 😅

Thanks! I will fix conflicts... on the next days 😄

sgomezvillamor avatar Jun 06 '22 19:06 sgomezvillamor

@sgomezvillamor I'm working on https://github.com/datahub-project/datahub/pull/6083, which should accomplish the end goal as this PR while maintaining backwards compat a bit better.

hsheth2 avatar Sep 28 '22 23:09 hsheth2

Superseded by https://github.com/datahub-project/datahub/pull/6083

sgomezvillamor avatar Sep 29 '22 08:09 sgomezvillamor