datahub
datahub copied to clipboard
Tags and Glossary Terms created via python api do not render correctly in Datahub frontend
Describe the bug Tags and glossary terms created via the python api appear on datasets and are searchable in the Datahub frontend (Screenshot 1) but can not be assigned to datasets from within the frontend. Only those tags and glossary terms created within the UI can be assigned to datasets within the UI (Screenshot 2). The glossary terms created from python code also do not show up in the glossary section of the frontend (Screenshot 3). You can find and click on one of the glossary terms attached to a dataset, which was created via the python api, to navigate to it but can not edit it from within the UI. It reports that the glossary term does not exist (Screenshot 4). Also observed that the URN of glossary terms varies depending on how they were created: Example URN of a glossary term created via python api: urn:li:glossaryTerm:term1 Example URN of a glossary term created via Datahub UI: urn:li:glossaryTerm:6aabe862-529e-4033-aad3-bc2462e23035
To Reproduce Steps to reproduce the behavior:
- Run the following python code (adapted from here) to create a dataset and assign glossary terms to it:
import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
import datahub.metadata.schema_classes as models
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
import time
GMS_HOST_URL = "http://localhost:8080"
def emit_terms(dataset_urn, terms_list):
graph = DataHubGraph(DatahubClientConfig(server=GMS_HOST_URL))
terms = []
for t in terms_list:
terms.append(models.GlossaryTermAssociationClass(urn=builder.make_term_urn(t)))
audit_stamp = models.AuditStampClass(time=int(time.time()*1000), actor="urn:li:corpuser:datahub")
glossary_terms = models.GlossaryTermsClass(terms=terms, auditStamp=audit_stamp)
metadata_event = MetadataChangeProposalWrapper(
entityType="dataset",
changeType=models.ChangeTypeClass.UPSERT,
entityUrn=dataset_urn,
aspectName="glossaryTerms",
aspect=glossary_terms,
)
graph.emit(metadata_event)
dataset_urn = builder.make_dataset_urn("s3", "test")
terms = ["term1FromPython", "term2FromPython"]
emit_terms(dataset_urn, terms)
Expected behavior Tags and glossary terms created via the python api should behave the same way as tags or glossary terms created via the UI
Screenshots
Screenshot 1
Screenshot 2
Screenshot 3
Screenshot 4
Additional context The issue has been observed in datahub version 0.9.0 and earlier
I found some unexpected behaviour with associating GlossaryTerms to datasets which seem to be related to mixed case terms. Have you tested with using all lowercase terms? (see: https://datahubspace.slack.com/archives/C029A3M079U/p1667383361186989?thread_ts=1666343681.646089&cid=C029A3M079U)
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
Hi @brandongranbrem is this still an issue for you?
Closing this one out as there hasn't been follow up. Please feel free to reopen if it's still an issue!
@aditya-radhakrishnan This issue is still present as of version v0.9.6.1. It does not appear that the letter case (upper, lower, mixed) of the glossary terms or tags is the cause of the issue. This bug should be re-opened.