dbt-artifacts-loader

It enables us to persist dbt artifacts uploaded on GCS to BigQuery with Cloud Run.

How to use

We push a docker image to Google Container Registry
We apply the terraform resources to launch the Cloud Run application.
We upload dbt artifacts JSON files to Google Cloud Storage.
We create the dbt models to efficiently analyze dbt artifacts.

Requirements

docker
terraform
dbt-bigquery

Push a docker image to Google Container Registry

Cloud Run accepts only a docker image on Google Cloud as Google Container Registry and Artifacts Registry.

export project="YOUR-PROJECT-ID"
docker build --rm -f Dockerfile  -t "gcr.io/${YOUR-PROJECT-ID}/dbt-artifacts-loader:latest" .
docker push "gcr.io/${YOUR-PROJECT-ID}/dbt-artifacts-loader:latest"

Apply the terraform resources to launch the Cloud Run application.

The terraform/example directory enables us to create required GCP resources and launch the Cloud Run application. It creates:

a GCS bucket to store dbt artifacts JSON files,
a BigQuery dataset and tables to store dbt artifacts,
a Pus/Sub topic and a corresponding Pub/Sub subscription to notify file uploads to Cloud Run, and
a Cloud Run application to load uploaded dbt artifacts to BigQuery.

The terraform project asks us to specify a GCP project ID to allocate the resources and a location of GCS, BigQuery and Cloud Run. terraform apply shows the prompt to enter the two. If you want to fix them, please edit the default values in variables.tf.

cd terraform/example
terraform init
terraform apply

Upload dbt artifacts JSON files to Google Cloud Storage.

We have set up the Cloud Run application to load dbt artifacts JSON files to BigQuery. Now, let's upload a run_results.json to GCS manually.

gsutil cp "target/run_results.json" "gs://${project}-dbt-artifacts/"

The function which the repository doesn't contain is to upload dbt artifacts JSON file. I would recommend you automate uploading JSON files to GCS after executing dbt.

Create the dbt models to efficiently analyze dbt artifacts.

We have almost done. The last thing we have to do is to create BigQuery tables with dbt, so that we efficiently analyze dbt artifacts.

To do that, we have to edit:

the vars in dbt_project.yml and
the project ID to run dbt in profiles.yml.

cd dbt_artifacts_models
# Set up the environment for dbt
make setup
# Create and test the dbt models
make run && make test

dbt-artifacts-loader
dbt-artifacts-loader copied to clipboard

Metadata

dbt-artifacts-loader

How to use

Requirements

Push a docker image to Google Container Registry

Apply the terraform resources to launch the Cloud Run application.

Upload dbt artifacts JSON files to Google Cloud Storage.

Create the dbt models to efficiently analyze dbt artifacts.

Links

← Metadata

Owner

Metadata

dbt-artifacts-loader dbt-artifacts-loader copied to clipboard

Metadata

dbt-artifacts-loader

How to use

Requirements

Push a docker image to Google Container Registry

Apply the terraform resources to launch the Cloud Run application.

Upload dbt artifacts JSON files to Google Cloud Storage.

Create the dbt models to efficiently analyze dbt artifacts.

Links

← Metadata

Owner

Metadata

dbt-artifacts-loader
dbt-artifacts-loader copied to clipboard