dbt-bigquery-monitoring icon indicating copy to clipboard operation
dbt-bigquery-monitoring copied to clipboard

dbt package to monitor BigQuery assets (tables & queries)

dbt-bigquery-monitoring

🚧 The package is still early stage and might vary a lot 🚧

dbt-bigquery-monitoring is a dbt package that provides models for monitoring BigQuery performance and costs.

Get started

Granting rights

To use this package, you will need to grant permissions to the Service Account that dbt uses to connect to BigQuery.

The required permissions (listed above) are available to:

if you prefer to use custom roles, you can use the following permissions.

Installing the package to your dbt project

Add the following to your packages.yml file:

packages:
  - package: bqbooster/dbt_bigquery_monitoring
    version: 0.3.0

Configure the package

A lot of settings have default values that can be overriden using:

  • dbt project variables (and therefore also by CLI variable override)
  • environment variables

However some of them don't so you need to set all of them in your project variables or environment variables.

Here is the mandatory settings to set in the dbt_project.yml file:

vars:
  # dbt bigquery monitoring vars
  input_gcp_projects: [ 'my-gcp-project', 'my-gcp-project-2' ]
Settings details

Following settings are defined as dbt_project_variable (Environment variable).

Required settings

  • input_gcp_projects (DBT_BQ_MONITORING_GCP_PROJECTS) : list of GCP projects to monitor

Optional settings

  • bq_region (DBT_BQ_MONITORING_REGION) : region where the monitored projects are located (default: us)
  • use_flat_pricing (DBT_BQ_MONITORING_USE_FLAT_PRICING) : whether to use flat pricing or not (default: true)
  • per_billed_tb_price (DBT_BQ_MONITORING_PER_BILLED_TB_PRICE) : price per billed TB (default: 5)
  • free_tb_per_month (DBT_BQ_MONITORING_FREE_TB_PER_MONTH) : free TB per month (default: 1)
  • hourly_slot_price (DBT_BQ_MONITORING_HOURLY_SLOT_PRICE) : price per slot per hour (default: 0.04)
  • active_logical_storage_gb_price (DBT_BQ_MONITORING_ACTIVE_LOGICAL_STORAGE_GB_PRICE) : price per active logical storage GB (default: 0.02)
  • long_term_logical_storage_gb_price (DBT_BQ_MONITORING_LONG_TERM_LOGICAL_STORAGE_GB_PRICE) : price per long term logical storage GB (default: 0.01)
  • active_physical_storage_gb_price (DBT_BQ_MONITORING_ACTIVE_PHYSICAL_STORAGE_GB_PRICE) : price per active physical storage GB (default: 0.04)
  • long_term_physical_storage_gb_price (DBT_BQ_MONITORING_LONG_TERM_PHYSICAL_STORAGE_GB_PRICE) : price per long term physical storage GB (default: 0.02)
  • free_storage_gb_per_month (DBT_BQ_MONITORING_FREE_STORAGE_GB_PER_MONTH) : free storage GB per month (default: 10)
  • lookback_window_days (DBT_BQ_MONITORING_LOOKBACK_WINDOW_DAYS) : number of days to look back for monitoring (default: 1)
  • output_materialization (DBT_BQ_MONITORING_OUTPUT_MATERIALIZATION) : materialization to use for the models (default: table)
  • output_limit_size (DBT_BQ_MONITORING_OUTPUT_LIMIT_SIZE) : limit size to use for the models (default: 1000)

Add metadata to queries (Recommanded but optional)

To enhance your query metadata with dbt model information, the package provides a dedicated macro that leverage "dbt query comments" (the header set at the top of each query) To configure the query comments, add the following config to dbt_project.yml.

query-comment:
  comment: '{{ dbt_bigquery_monitoring.get_query_comment(node) }}'

Running the package

The package is designed to be run as a daily or hourly job. To do so, you can use the following dbt command:

dbt run -s tag:dbt-bigquery-monitoring

Using the package

The package provides the following datamarts that can be easily used to build monitoring charts and dashboards:

  • global

    • daily_spend
  • compute

    • compute_cost_per_hour
    • most_expensive_jobs
    • most_expensive_users
    • most_repeated_jobs
    • slowest_jobs
  • storage

    • most_expensive_tables
    • read_heavy_tables
    • unused_tables

Contributing

If you feel like contribute, don't hesitate to open an issue and submit a PR.

Setup a profile

To run the package in development mode (ie from that repository instead of through an installed package), you will need to setup a profile that will be used to connect to BigQuery.

The profile used is for the project dbt_bigquery_monitoring and can be configured as follow for a production account using a service account keyfile:

dbt_bigquery_monitoring:
  outputs:
    default:
      type: bigquery

      ## Service account auth ##
      method: service-account
      keyfile: [full path to your keyfile]

      project: [project id] # storage project
      execution_project: [execution project id] # execution project
      dataset: [dataset name] # dbt_bigquery_monitoring dataset, you may just use dbt_bigquery_monitoring
      threads: 4
      location: [dataset location]
      priority: interactive

      timeout_seconds: 1000000

if you're running locally to try the package you can swap the method to method: oauth (and remove the keyfile line).