dbt-bigquery-monitoring
dbt-bigquery-monitoring copied to clipboard
dbt package to monitor BigQuery assets (tables & queries)
dbt-bigquery-monitoring
🚧 The package is still early stage and might vary a lot 🚧
dbt-bigquery-monitoring is a dbt package that provides models for monitoring BigQuery performance and costs.
Get started
Granting rights
To use this package, you will need to grant permissions to the Service Account that dbt uses to connect to BigQuery.
The required permissions (listed above) are available to:
BigQuery Resource AdminBigQuery Adminroles but feel free to create a custom role.
if you prefer to use custom roles, you can use the following permissions.
- bigquery.tables.get - To access BigQuery tables data
- bigquery.tables.list - To access BigQuery tables data
- bigquery.jobs.listAll
- At the organization or project level, depending on desired scope
- Note that JOBS_BY_ORGANIZATION is only available to users with defined Google Cloud organizations. More information on permissions and access control in BigQuery can be found here.
- bigquery.reservations.list - To access BigQuery Reservations data
- bigquery.capacityCommitments.list - To access BigQuery Reservations data
- bigquery.reservationAssignments.list - To access BigQuery Reservations data
Installing the package to your dbt project
Add the following to your packages.yml file:
packages:
- package: bqbooster/dbt_bigquery_monitoring
version: 0.3.0
Configure the package
A lot of settings have default values that can be overriden using:
- dbt project variables (and therefore also by CLI variable override)
- environment variables
However some of them don't so you need to set all of them in your project variables or environment variables.
Here is the mandatory settings to set in the dbt_project.yml file:
vars:
# dbt bigquery monitoring vars
input_gcp_projects: [ 'my-gcp-project', 'my-gcp-project-2' ]
Settings details
Following settings are defined as dbt_project_variable (Environment variable).
Required settings
input_gcp_projects(DBT_BQ_MONITORING_GCP_PROJECTS) : list of GCP projects to monitor
Optional settings
bq_region(DBT_BQ_MONITORING_REGION) : region where the monitored projects are located (default:us)use_flat_pricing(DBT_BQ_MONITORING_USE_FLAT_PRICING) : whether to use flat pricing or not (default:true)per_billed_tb_price(DBT_BQ_MONITORING_PER_BILLED_TB_PRICE) : price per billed TB (default:5)free_tb_per_month(DBT_BQ_MONITORING_FREE_TB_PER_MONTH) : free TB per month (default:1)hourly_slot_price(DBT_BQ_MONITORING_HOURLY_SLOT_PRICE) : price per slot per hour (default:0.04)active_logical_storage_gb_price(DBT_BQ_MONITORING_ACTIVE_LOGICAL_STORAGE_GB_PRICE) : price per active logical storage GB (default:0.02)long_term_logical_storage_gb_price(DBT_BQ_MONITORING_LONG_TERM_LOGICAL_STORAGE_GB_PRICE) : price per long term logical storage GB (default:0.01)active_physical_storage_gb_price(DBT_BQ_MONITORING_ACTIVE_PHYSICAL_STORAGE_GB_PRICE) : price per active physical storage GB (default:0.04)long_term_physical_storage_gb_price(DBT_BQ_MONITORING_LONG_TERM_PHYSICAL_STORAGE_GB_PRICE) : price per long term physical storage GB (default:0.02)free_storage_gb_per_month(DBT_BQ_MONITORING_FREE_STORAGE_GB_PER_MONTH) : free storage GB per month (default:10)lookback_window_days(DBT_BQ_MONITORING_LOOKBACK_WINDOW_DAYS) : number of days to look back for monitoring (default:1)output_materialization(DBT_BQ_MONITORING_OUTPUT_MATERIALIZATION) : materialization to use for the models (default:table)output_limit_size(DBT_BQ_MONITORING_OUTPUT_LIMIT_SIZE) : limit size to use for the models (default:1000)
Add metadata to queries (Recommanded but optional)
To enhance your query metadata with dbt model information, the package provides a dedicated macro that leverage "dbt query comments" (the header set at the top of each query)
To configure the query comments, add the following config to dbt_project.yml.
query-comment:
comment: '{{ dbt_bigquery_monitoring.get_query_comment(node) }}'
Running the package
The package is designed to be run as a daily or hourly job. To do so, you can use the following dbt command:
dbt run -s tag:dbt-bigquery-monitoring
Using the package
The package provides the following datamarts that can be easily used to build monitoring charts and dashboards:
-
global
daily_spend
-
compute
compute_cost_per_hourmost_expensive_jobsmost_expensive_usersmost_repeated_jobsslowest_jobs
-
storage
most_expensive_tablesread_heavy_tablesunused_tables
Contributing
If you feel like contribute, don't hesitate to open an issue and submit a PR.
Setup a profile
To run the package in development mode (ie from that repository instead of through an installed package), you will need to setup a profile that will be used to connect to BigQuery.
The profile used is for the project dbt_bigquery_monitoring and can be configured as follow for a production account using a service account keyfile:
dbt_bigquery_monitoring:
outputs:
default:
type: bigquery
## Service account auth ##
method: service-account
keyfile: [full path to your keyfile]
project: [project id] # storage project
execution_project: [execution project id] # execution project
dataset: [dataset name] # dbt_bigquery_monitoring dataset, you may just use dbt_bigquery_monitoring
threads: 4
location: [dataset location]
priority: interactive
timeout_seconds: 1000000
if you're running locally to try the package you can swap the method to method: oauth (and remove the keyfile line).