datahub
datahub copied to clipboard
feat(spark-lineage): support bigquery, s3, gcs
This PR:
- Adds support for bigquery tables.
- Recognizes and creates datasets for s3 and gcs with explicit platforms. Previously, platform was getting populated as hdfs
- Adds support to configure dataset urn similar to python sources.
Checklist
- [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [ ] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
Unit Test Results (build & test)
597 tests ±0 593 :heavy_check_mark: ±0 12m 1s :stopwatch: -18s 147 suites ±0 4 :zzz: ±0 147 files ±0 0 :x: ±0
Results for commit da5cd672. ± Comparison against base commit 333598fd.
:recycle: This comment has been updated with latest results.
Watching this PR. this feature would be a big add!
Any update on this PR?
We are actively looking for volunteers to help revive and drive this PR to the finish line!
Cheers John