owid-grapher
owid-grapher copied to clipboard
Datasets should be aware of references in explorers
Problem
We can't delete some datasets that appear unused, since they might be baked indirectly into explorers.
Context
When an author uploads a dataset, there is an checkbox option to enable automatic republishing to the owid-datasets repo as a CSV.
In turn, we have explorers that directly reference these Github CSVs. If we count them, we find 10:
owid-content/explorers (master *) » grep -l owid-datasets *
agricultural-productivity.explorer.tsv
crop-yields.explorer.tsv
csv_tests.explorer.tsv
energy-scenarios.explorer.tsv
fish-stocks.explorer.tsv
habitat-loss.explorer.tsv
impacts-of-energy-sources.explorer.tsv
natural-disasters.explorer.tsv
ukraine-russia-food.explorer.tsv
water-and-sanitation.explorer.tsv
Potential solutions
Disallow serving explorers from Github CSVs
This would seem to hide the problem. Then we would end up deleting the datasets, and having more explorers that don't have data in grapher.
Update a live content graph on every change to owid-content
We could watch the owid-content
repo and periodically update MySQL on live with content-graph changes, adding (and removing) links from explorers to baked datasets. These links would prevent accidental deletion thanks to foreign-key constraints, and would enable better reporting on the admin.
- Relates to #1338
From discussion with @danyx23 , this problem seems like it will get easier to fix once explorers can read metadata from static files / the API because explorers will then have a unique variable identifier that we can reuse for this purpose.
This issue has had no activity within 10 months. It is considered stale and will be closed in 7 days unless it is worked on or tagged as pinned.
We actually did solve this one through automation that puts the explorer data model into MySQL.