amundsen
amundsen copied to clipboard
Adds ability to extract metadata from Google BigQuery using SQLAlchemy
feat – Adds ability to extract metadata from Google BigQuery using SQLAlchemy
Summary of Changes
Based on the Metadata Extractor for Snowflake, I created a metadata extractor for Google BigQuery, which queries the INFORMATION_SCHEMA instead of looping through all the projects and tables in the database.
- Addition of the CREDS_PATH parameter to the databuilder/databuilder/extractor/sql_alchemy_extractor.py file
- Addition of required libraries to the databuilder/setup.py file
- Creation of the databuilder/databuilder/extractor/bigquery_sql_alchemy_metadata_extractor.py file
- Creation of the databuilder/example/scripts/sample_bigquery_sql_alchemy_data_loader.py implementation file.
Tests
Implemented this feature successfully in a client's project.
Documentation
Similar to the Snowflake Metadata Extractor, in order to extract metadata from BigQuery using this method, the following parameters need to be provided by modifying the databuilder/example/scripts/sample_bigquery_sql_alchemy_data_loader.py file:
- BIGQUERY_PROJECT_KEY = 'YourBigqueryProjectId'
- BIGQUERY_TABLE_SCHEMA_KEY = 'YourBigqueryTableSchemaName'
- BIGQUERY_CONNECTION_STRING = f'bigquery://{BIGQUERY_PROJECT_KEY}'
- BIGQUERY_CREDENTIALS_PATH = ''
Refer this doc: https://github.com/googleapis/python-bigquery-sqlalchemy for supported connection parameters and configurations
CheckList
- [X] PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
- [X] PR includes a summary of changes.
- [X] PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
- [X] In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does
Congratulations on your first Pull Request and welcome to Amundsen community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/amundsen-io/amundsen/blob/main/CONTRIBUTING.md)
cc @xuan616
That's correct
On Mon, Feb 28, 2022, 20:06 Xuan @.***> wrote:
@.**** commented on this pull request.
In databuilder/databuilder/extractor/bigquery_sql_alchemy_metadata_extractor.py https://github.com/amundsen-io/amundsen/pull/1740#discussion_r816371448:
self._database = conf.get_string(BigQueryMetadataExtractor.DATABASE_KEY)
self._schema = conf.get_string(BigQueryMetadataExtractor.DATABASE_KEY)self._bigquery_project = conf.get_string(BigQueryMetadataExtractor.BIGQUERY_PROJECT_KEY)self._bigquery_table_schema = conf.get_string(BigQueryMetadataExtractor.BIGQUERY_TABLE_SCHEMA_KEY)self._bigquery_schema = conf.get_string(BigQueryMetadataExtractor.BIGQUERY_SCHEMA_KEY)self.sql_stmt = BigQueryMetadataExtractor.SQL_STATEMENT.format(where_clause_suffix=conf.get_string(BigQueryMetadataExtractor.WHERE_CLAUSE_SUFFIX_KEY),project=self._bigquery_project,table_schema=self._bigquery_table_schema,schema=self._bigquery_schema)LOGGER.info('SQL for bigquery metadata: %s', self.sql_stmt)self._alchemy_extractor = sql_alchemy_extractor.from_surrounding_config(conf, self.sql_stmt)does it support only keyfile path for now
— Reply to this email directly, view it on GitHub https://github.com/amundsen-io/amundsen/pull/1740#pullrequestreview-895754086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZBKFBFPGF2MLEYFPBI3U5QLX7ANCNFSM5PSGUIKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
LGTM. Hi @avelasquezhe84 Can I know what is the difference from the current bigquery_metadata_extrator in terms of the results extracted when we're using sqlalchemy/information_schema, like the filter for sharded tables. Thanks
BTW, it would be better if there are unit tests for this new extractor. :smiley:
+1 on the unit test