owid-grapher icon indicating copy to clipboard operation
owid-grapher copied to clipboard

Datasets should be aware of references in explorers

Open larsyencken opened this issue 2 years ago • 2 comments

Problem

We can't delete some datasets that appear unused, since they might be baked indirectly into explorers.

Context

When an author uploads a dataset, there is an checkbox option to enable automatic republishing to the owid-datasets repo as a CSV.

In turn, we have explorers that directly reference these Github CSVs. If we count them, we find 10:

owid-content/explorers (master *) » grep -l owid-datasets *
agricultural-productivity.explorer.tsv
crop-yields.explorer.tsv
csv_tests.explorer.tsv
energy-scenarios.explorer.tsv
fish-stocks.explorer.tsv
habitat-loss.explorer.tsv
impacts-of-energy-sources.explorer.tsv
natural-disasters.explorer.tsv
ukraine-russia-food.explorer.tsv
water-and-sanitation.explorer.tsv

Potential solutions

Disallow serving explorers from Github CSVs

This would seem to hide the problem. Then we would end up deleting the datasets, and having more explorers that don't have data in grapher.

Update a live content graph on every change to owid-content

We could watch the owid-content repo and periodically update MySQL on live with content-graph changes, adding (and removing) links from explorers to baked datasets. These links would prevent accidental deletion thanks to foreign-key constraints, and would enable better reporting on the admin.

larsyencken avatar May 26 '22 07:05 larsyencken

  • Relates to #1338

larsyencken avatar May 26 '22 07:05 larsyencken

From discussion with @danyx23 , this problem seems like it will get easier to fix once explorers can read metadata from static files / the API because explorers will then have a unique variable identifier that we can reuse for this purpose.

larsyencken avatar Jun 07 '22 11:06 larsyencken

This issue has had no activity within 10 months. It is considered stale and will be closed in 7 days unless it is worked on or tagged as pinned.

github-actions[bot] avatar Aug 10 '23 07:08 github-actions[bot]

We actually did solve this one through automation that puts the explorer data model into MySQL.

larsyencken avatar Aug 18 '23 09:08 larsyencken