Adler Santos
Adler Santos
## Description A common gotcha is when the Airflow variables are incorrectly set or missing in the `.dev|.staging|.prod` folders. We can catch this error upstream by checking if the variables...
## Description We're using conventional commits and when contributors aren't aware of this, they write their commit messages that eventually errors out when GitHub actions are run. To help catch...
## Description Some Terraform resource properties can safely change after deployment, e.g. the `label` property on BQ tables. We should support Terraform states not having to keep changing when such...
## Description To prevent duplicating BigQuery schema definitions in YAML config files (say, when multiple BigQuery tables use the same schema), it might be nice to allow users to create...
## Description In order to DRY out values commonly specified in a YAML config (e.g. the dataset name or Airflow data folder), let’s support self-referencing variables in YAML. This is...
## Description The concept of a **dataset** is starting to become an overloaded term. It could mean the following: * A BigQuery dataset which is a collection of tables. This...
## Description Instead of going through the pipeline development instructions for simple pipelines (such as a simple CSV to BigQuery dump), we can use a templating tool that creates all...
## Description CSV column name remapping is a very common use case for data transforms. To prevent contributors from having to keep writing a custom script to rename CSV columns,...
## Description We expect more and more pipelines to include custom scripts (e.g. transforming CSV files, reading `.shp` or `.nc` files, scraping websites, and so on) and we need to...
## Description gnomAD pipeline was previously set to run ad-hoc. The Broad Institute now requires us to run this more frequently. Thus I have set the DAG to run daily.