datahub icon indicating copy to clipboard operation
datahub copied to clipboard

feat(ingest): add ability to preserve dbt table identifier casing

Open viplazylmht opened this issue 2 years ago • 3 comments

Summary

Resolve the issue https://github.com/datahub-project/datahub/issues/7853.

Checklist

  • [x] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [x] Links to related issues (if applicable)
  • [ ] Tests for the changes have been added/updated (if applicable)
  • [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

viplazylmht avatar Apr 19 '23 04:04 viplazylmht

@viplazylmht the code overall looks good here

However, I've generally found that the approach of setting convert_dataset_urns to True everywhere more reliably produced correct lineage. As such, I'm curious to understand the motivation behind this PR

hsheth2 avatar May 24 '23 04:05 hsheth2

Hi @viplazylmht - we haven't seen activity on this PR for a little bit, are you still interested in contributing? If not we'll go ahead and close it if we haven't heard back from you in a week!

laulpogan avatar Jun 07 '23 17:06 laulpogan

@hsheth2 @laulpogan I'm here. Well, convert_dataset_urns_to_lowercase currently has the default value as True, so it will not break any lineages.

In my case, I use datahub with dbt and Bigquery, and the Bigquery adapter said that they have a convert_urns_to_lowercase configuration, but default to False. So the urns they produced are completely different (because our bigquery tables are in UPPERCASE). image

I am planning to integrate dbt to the existing datahub x bigquery production environment, so dbt should have the above config, instead of dropping all current metadata and ingesting all again.

viplazylmht avatar Jun 11 '23 02:06 viplazylmht