feat(ingest): add ability to preserve dbt table identifier casing
Summary
Resolve the issue https://github.com/datahub-project/datahub/issues/7853.
Checklist
- [x] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [x] Links to related issues (if applicable)
- [ ] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
@viplazylmht the code overall looks good here
However, I've generally found that the approach of setting convert_dataset_urns to True everywhere more reliably produced correct lineage. As such, I'm curious to understand the motivation behind this PR
Hi @viplazylmht - we haven't seen activity on this PR for a little bit, are you still interested in contributing? If not we'll go ahead and close it if we haven't heard back from you in a week!
@hsheth2 @laulpogan I'm here. Well, convert_dataset_urns_to_lowercase currently has the default value as True, so it will not break any lineages.
In my case, I use datahub with dbt and Bigquery, and the Bigquery adapter said that they have a convert_urns_to_lowercase configuration, but default to False. So the urns they produced are completely different (because our bigquery tables are in UPPERCASE).
I am planning to integrate dbt to the existing datahub x bigquery production environment, so dbt should have the above config, instead of dropping all current metadata and ingesting all again.