[Feature] Add support for GCP BigQuery
Summary
Add support for ingesting GCP BigQuery resources into Cartography. BigQuery is Google Cloud's fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. This feature would allow Cartography to track BigQuery datasets, tables, views, routines, models, and their access controls.
Motivation
BigQuery is a foundational data platform used across many organizations for analytics, ML, and data sharing. It represents a critical surface for security analysis due to the sensitive data it often contains. By ingesting BigQuery resources, Cartography can surface:
- Dataset and table inventory across projects
- IAM bindings at dataset and table levels (fine-grained access control)
- External tables and their connections to GCS, Drive, or other sources
- Views and their underlying table dependencies (authorized views)
- Data sharing via Analytics Hub linked datasets
- ML models and routines (stored procedures/functions)
- Row-level and column-level security policies
This unlocks graph-based security analysis such as:
- Identifying datasets/tables with overly permissive access
- Tracking data lineage from source tables through views
- Detecting publicly accessible datasets
- Mapping which service accounts have access to sensitive tables
- Finding external tables that expose data from GCS buckets
- Auditing cross-project data sharing patterns
Proposed Solution
Extend the GCP intel module to call the BigQuery APIs and model the following resources:
New Nodes
Core Resources:
GCPBigQueryDataset- Top-level container for tables/viewsGCPBigQueryTable- Data tables (native and external)GCPBigQueryView- Logical and materialized viewsGCPBigQueryRoutine- Functions, procedures, remote functionsGCPBigQueryModel- BigQuery ML models
Access Control:
GCPBigQueryRowAccessPolicy- Row-level security policies
Data Sharing (Analytics Hub):
GCPBigQueryDataExchange- Analytics Hub exchangesGCPBigQueryListing- Published dataset listingsGCPBigQueryLinkedDataset- Subscribed linked datasets
Connections:
GCPBigQueryConnection- External data source connections (Cloud SQL, Spanner, etc.)
New Relationships
Hierarchy:
(:GCPProject)-[:RESOURCE]->(:GCPBigQueryDataset)(:GCPBigQueryDataset)-[:CONTAINS]->(:GCPBigQueryTable)(:GCPBigQueryDataset)-[:CONTAINS]->(:GCPBigQueryView)(:GCPBigQueryDataset)-[:CONTAINS]->(:GCPBigQueryRoutine)(:GCPBigQueryDataset)-[:CONTAINS]->(:GCPBigQueryModel)
Access & Security:
(:GCPBigQueryTable)-[:HAS_ROW_ACCESS_POLICY]->(:GCPBigQueryRowAccessPolicy)(:GCPBigQueryDataset)-[:ALLOWS_ACCESS]->(:GCPServiceAccount|GCPUser|GCPGroup)(via IAM bindings)(:GCPBigQueryTable)-[:ALLOWS_ACCESS]->(:GCPServiceAccount|GCPUser|GCPGroup)(table-level IAM)
Data Lineage & Dependencies:
(:GCPBigQueryView)-[:REFERENCES]->(:GCPBigQueryTable)(view dependencies)(:GCPBigQueryView)-[:AUTHORIZED_FOR]->(:GCPBigQueryDataset)(authorized views)(:GCPBigQueryTable)-[:EXTERNAL_SOURCE]->(:GCSBucket)(external tables backed by GCS)(:GCPBigQueryTable)-[:USES_CONNECTION]->(:GCPBigQueryConnection)(BigLake/external connections)
Data Sharing:
(:GCPBigQueryDataExchange)-[:HAS_LISTING]->(:GCPBigQueryListing)(:GCPBigQueryListing)-[:SHARES]->(:GCPBigQueryDataset)(:GCPBigQueryLinkedDataset)-[:SUBSCRIBED_TO]->(:GCPBigQueryListing)
Key Properties
GCPBigQueryDataset:
id,dataset_id,project_idlocation,default_table_expiration_mslabels,descriptioncreation_time,last_modified_time
GCPBigQueryTable:
id,table_id,dataset_id,project_idtype(TABLE, VIEW, MATERIALIZED_VIEW, EXTERNAL, SNAPSHOT)location,num_bytes,num_rowscreation_time,last_modified_time,expiration_timeclustering_fields,time_partitioningencryption_configuration(CMEK)
GCPBigQueryView:
id,view_id,dataset_id,project_idquery(defining SQL)use_legacy_sqlmaterialized(boolean)
GCP APIs to Integrate
bigquery.googleapis.com- BigQuery API v2datasets.list,datasets.gettables.list,tables.getroutines.list,routines.getmodels.list,models.getrowAccessPolicies.list
analyticshub.googleapis.com- Analytics Hub APIdataExchanges.listlistings.list
bigqueryconnection.googleapis.com- BigQuery Connection APIconnections.list
Alternatives Considered
- Using Cloud Asset Inventory for BigQuery - CAI provides basic metadata but misses table-level details, view definitions, and row access policies
- Focusing only on datasets - misses the table-level granularity needed for data security analysis
- Skipping Analytics Hub - would miss important cross-org data sharing patterns