feat(ingest/glue): Include environment on container guid key of container URN
Checklist
We are currently excluding environment while generating the GUID key for container urn.
This PR introducing a configurable option include_env_on_container_key which can be used if user wish to consider environment as well along with instance while generating the GUID of a container urn to avoid collision for same instance and different environment.
PR changes include:
-
A new config flag
include_env_on_container_keyis added, if it is set toTruethenenvwill be included to generate container GUID key along with platforminstance. -
Updated Glue connector code to use the
include_env_on_container_keyto generatecontainerguid key.
- [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [ ] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
@hsheth2 The env is deprecated and we are promoting to use platform-instance instead of env, however this PR is introducing the env in container key generation, so I am not sure if we want to go ahead with the PR.
Need your input on this PR
@hsheth2 The main driver for us is consistent model and user experience with other entities. Even if environment is not required for uniqueness, all other entities have the environment as a descriptor.
The deprecation warning was recently removed here https://github.com/datahub-project/datahub/pull/10581, is that a rollback on the plan to move environment to platform instance? If so, that would be good reason to add the missing environment descriptor in the only data asset entity missing it.
@hsheth2
Consistency may not be reason enough to add environment as part of the identity of the containers. 👌
However, our users find confusing that there is an environment filter for most of the entities in the catalogue such as datasets, charts, dashboards, etc, but containers. Even, datasets belonging to a container have the environment filter while the container itself misses it.
To fix this inconsistent user experience, is there any way that we could enable the environment filter capability to containers while not including it as part of the identity (the URN)?
@sgomezvillamor that makes a ton of sense. We recently did something similar for datajobs/dataflows https://github.com/datahub-project/datahub/pull/10814
@sgomezvillamor that makes a ton of sense. We recently did something similar for datajobs/dataflows #10814
I see. That would require to add the env property to the eg container properties aspect.
/**
* Environment for this job
*/
@Searchable = {
"fieldType": "KEYWORD",
"addToFilters": true,
"filterNameOverride": "Environment",
"queryByDefault": false
}
env: optional FabricType
We have concluded that we will not move ahead with the proposal of this PR as it will not be necessary for us, hence closing the PR. Thank you folks for your time to review and providing valuable feedback!