datahub
datahub copied to clipboard
fix(ingestion): Fetch Upstreams From Columns
This PR fixes the following two things:
- Tableau metadata graphql API returns empty list for
upstreamTablesfor embedded datasources whileupstreamColumnsfield includes information. This PR populates upstream table information fromupstreamColumnsfield coming from the Tableau - Tableau metadata graphql API returns malfunctioned SQL queries which causing failures in fetching upstream Lineage from the CustomSQLs. This also enables embedded data sources to be connected to the CustomSQLs which are (generally) connected to upstreams from other platforms hence completing the full lineage.
Checklist
- [x] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [ ] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
@egemenberk some query cleaning logic is also getting adding in this PR https://github.com/datahub-project/datahub/pull/9838 - that one also removes parameter names and things to make SQL parsing work. Does it make sense to unify across these two query cleaning implementations?
@egemenberk some query cleaning logic is also getting adding in this PR #9838 - that one also removes parameter names and things to make SQL parsing work. Does it make sense to unify across these two query cleaning implementations?
Hi @hsheth2, I've taken a quick look at the PR you mentioned and it seems to fix the query, so I can remove the clean_query() method call from my implementation. My PR's main focus is to fix fetching upstream lineage from upstreamColumns when upstreamTables field is empty in Tableau response, the clean_query addition was a side fix while working on the task, so I can remove my addition on that and trust the #9838 implementation. Thanks for the information 👍