etl
etl copied to clipboard
A compute graph for loading and transforming OWID's data
Historically, we've been using function `dataset["my_table"]` to access table from a dataset. Recently, a new helper method `dataset.read_table(reset_index: bool)` has been added that lets us read the table with reset...
## Problem The chart diff flow hasn't been designed for draft charts, so lingering draft charts can cause it to crash. ## Expected behaviour Draft charts should also be queued...
## Problem Currently, running `etl d version-tracker` raises an error, because, e.g. ``` * Missing step grapher://grapher/energy/2024-06-20/primary_energy_consumption is a dependency of the following active steps: export://multidim/energy/latest/energy ``` ## Expected behaviour...
## Background `uv` is a very fast Python package manager. It does some nice things: - It replaces `pyenv` by using its own redistributable Python veresions - It picks package...
Refactor upserts to MySQL from `grapher://` step. This allows us to compare checksums of data & metadata by indicator and skip upserts to `variables` table in MySQL if metadata doesn't...
Refactor Step paths like channel / namespace / version / name. Move the logic from bespoke functions to steps themselves and use `Step` properties to access them. Add `CatalogPath` to...
We have a single case where public dataset (`data://garden/covid/latest/combined` and hence our full covid dataset) depends on private dataset `data-private://garden/covid/latest/sequence`. ``` data://garden/covid/latest/combined: - data://garden/covid/latest/testing - data://garden/covid/latest/cases_deaths - data-private://garden/covid/latest/sequence - data://garden/demography/2024-07-15/population...
## Problem Some grapher steps can take annoyingly long, especially when on poor internet connection. For instance ``` etlr grapher://grapher/minerals/2024-07-15/minerals --grapher --force --only ``` can take anywhere between 20s and...
## Problem We use the Jinja2 templating engine in our metadata YAML files, especially in cases with dimensions, to avoid repeating the same phrases over and over. The problem is...