datahub icon indicating copy to clipboard operation
datahub copied to clipboard

feat(ingest/glue): Include environment on container guid key of container URN

Open siladitya2 opened this issue 1 year ago • 2 comments

Checklist

We are currently excluding environment while generating the GUID key for container urn. This PR introducing a configurable option include_env_on_container_key which can be used if user wish to consider environment as well along with instance while generating the GUID of a container urn to avoid collision for same instance and different environment.

PR changes include:

  1. A new config flag include_env_on_container_key is added, if it is set to True then env will be included to generate container GUID key along with platform instance.

  2. Updated Glue connector code to use the include_env_on_container_key to generate container guid key.

  • [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [ ] Links to related issues (if applicable)
  • [ ] Tests for the changes have been added/updated (if applicable)
  • [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

siladitya2 avatar Jun 14 '24 09:06 siladitya2

@hsheth2 The env is deprecated and we are promoting to use platform-instance instead of env, however this PR is introducing the env in container key generation, so I am not sure if we want to go ahead with the PR.

Need your input on this PR

sid-acryl avatar Jun 27 '24 06:06 sid-acryl

@hsheth2 The main driver for us is consistent model and user experience with other entities. Even if environment is not required for uniqueness, all other entities have the environment as a descriptor.

The deprecation warning was recently removed here https://github.com/datahub-project/datahub/pull/10581, is that a rollback on the plan to move environment to platform instance? If so, that would be good reason to add the missing environment descriptor in the only data asset entity missing it.

siladitya2 avatar Jun 28 '24 14:06 siladitya2

@hsheth2

Consistency may not be reason enough to add environment as part of the identity of the containers. 👌

However, our users find confusing that there is an environment filter for most of the entities in the catalogue such as datasets, charts, dashboards, etc, but containers. Even, datasets belonging to a container have the environment filter while the container itself misses it.

To fix this inconsistent user experience, is there any way that we could enable the environment filter capability to containers while not including it as part of the identity (the URN)?

sgomezvillamor avatar Jul 22 '24 16:07 sgomezvillamor

@sgomezvillamor that makes a ton of sense. We recently did something similar for datajobs/dataflows https://github.com/datahub-project/datahub/pull/10814

hsheth2 avatar Jul 23 '24 19:07 hsheth2

@sgomezvillamor that makes a ton of sense. We recently did something similar for datajobs/dataflows #10814

I see. That would require to add the env property to the eg container properties aspect.

  /**
   * Environment for this job
   */
  @Searchable = {
    "fieldType": "KEYWORD",
    "addToFilters": true,
    "filterNameOverride": "Environment",
    "queryByDefault": false
  }
  env: optional FabricType

sgomezvillamor avatar Jul 24 '24 14:07 sgomezvillamor

We have concluded that we will not move ahead with the proposal of this PR as it will not be necessary for us, hence closing the PR. Thank you folks for your time to review and providing valuable feedback!

siladitya2 avatar Aug 05 '24 13:08 siladitya2