datahub icon indicating copy to clipboard operation
datahub copied to clipboard

feat(spark-lineage): support bigquery, s3, gcs

Open MugdhaHardikar-GSLab opened this issue 2 years ago • 3 comments

This PR:

  1. Adds support for bigquery tables.
  2. Recognizes and creates datasets for s3 and gcs with explicit platforms. Previously, platform was getting populated as hdfs
  3. Adds support to configure dataset urn similar to python sources.

Checklist

  • [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [ ] Links to related issues (if applicable)
  • [ ] Tests for the changes have been added/updated (if applicable)
  • [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

MugdhaHardikar-GSLab avatar Aug 19 '22 13:08 MugdhaHardikar-GSLab

Unit Test Results (build & test)

597 tests  ±0   593 :heavy_check_mark: ±0   12m 1s :stopwatch: -18s 147 suites ±0       4 :zzz: ±0  147 files   ±0       0 :x: ±0 

Results for commit da5cd672. ± Comparison against base commit 333598fd.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Aug 19 '22 14:08 github-actions[bot]

Watching this PR. this feature would be a big add!

Jiafi avatar Sep 02 '22 14:09 Jiafi

Any update on this PR?

Jiafi avatar Sep 19 '22 17:09 Jiafi

We are actively looking for volunteers to help revive and drive this PR to the finish line!

Cheers John

jjoyce0510 avatar Jan 24 '23 03:01 jjoyce0510