etl wizard: improve new data workflow

One-liner

Improvements to the new data update workflow.

Context & details

At high level, the new workflow is:

Staging preparation
- Create branch, PR
- Data work
- Push changes
Indicator upgrader
- User maps indicators to their new versions (as much as possible should be automated).
- Charts are updated using the new indicators.
Chart diff: on-going revision of changes in charts
- Show all charts that have changed in staging. Changes can be due to: (i) indicator changes, (ii) changes in chart config (from admin), (iii) changes in indicator values (data work in ETL) and (iv) changes in indicator metadata (data work in ETL).
- When all charts are approved, owidbot should signal this in its comment in the PR
- Once all looks fine, merge PR.
After PR merge

Charts are automatically updates in production.
If there is any conflict, a manual revision is needed. Maybe use old chart approval here?

Note that there can be data work at any point between steps 1 and 2, hence chart-diff should also reflect chart changes due to

Future work

Indicator Upgrade

[x] Smarter detection and presentation of new datasets:
- [x] Compare changes between master and new-branch
- [x] Extract list of new grapher datasets
- [x] For each new grapher dataset, sort the list of datasets in "old dataset". Ideally should be: 1) older version(s) of dataset if possible, 2) fuzzy match of URIs.
- PR: https://github.com/owid/etl/pull/2595
[x] Present the various "dataset migrations"
- [x] In the "new dataset" dropdown, only list those datasets that are new. By choosing one there, one is specifying the "dataset migration" they want to make. Show first the biggest one? Show last those that are already in use (at least one chart uses them = migration previously done?)
- PR: https://github.com/owid/etl/pull/2595
[ ] QA & other improvements
- [ ] PR: https://github.com/owid/etl/issues/2610

Chart diff

[ ] What happens if: (i) approve all charts, (ii) edit one chart in admin from staging, (iii) merge the PR without going back to chart-diff. Is that edited chart approved and therefore pushed to production? Or is it tagged as unapproved?
[ ] Support for deleted charts
- Maybe we could start by just signaling this.
[ ] Add 'explore mode' to chart-diff. Now, if a chart-diff appears due to data or metadata change, it is hard to understand where the discrepancies are coming from. We could have an explore mode, similar to indicator upgrader's, where the user can compare the data and the metadata. The 'explorer mode' should be a multi-indicator one.
- [ ] Data comparison. Essentially the same as in indicator upgrader.
- [x] A diff view between the metadata from old and new indicators.
[x] Support for draft charts
[x] Show chart changes due to indicator changes in values or metadata (work in ETL). Currently chart diff only shows charts that have a different config compared to production. NOTE: as long as they've been edited via Admin API (admin page, or Admin API in python).
- PR: https://github.com/owid/etl/pull/2752
- [x] https://github.com/owid/etl/issues/2726
[x] Stress test it: what happens if we have 100 charts? 200? Do we need pagination?
[x] Run chart diff periodically against all open PRs in ETL and post results back with owidbot
- [x] Is it too much to run everytime there is a change in the approval table?
[x] ~Track indicator version updates. E.g. a table mapping indicator IDs (old → new)~ Not doing this for now.
[x] Enable 'reset chart' in case you accidentally edited one chart that you should have not.
- Solved by reject button
[x] Add a conflict resolver option: show as a form with fields that do bot match.
- https://github.com/owid/etl/pull/2771
[x] ~Add the link to page at the top (easy to copy), if a subset of charts are presented~ Won't do
[x] fix key bug for new charts
[x] More info for each chart diff:
- [x] ~dataset name~ WON'T DO FOR NOW
- [x] draft?
- [x] Change type
[x] Present modified and new charts together. filter them in the panel or via query params.
[x] ~display state change as a plot (only if fast enough)~ good as-is

After merging branch

[x] After merging the PR, there should be an automated chart-sync call (no need to run it manually)
- [x] PR: https://github.com/owid/etl/pull/2615
- [ ] Run it for real without --dry-run
[ ] Merge conflicts in production
- Example of conflict
  - Data manager edits chart A on staging → they merge the PR from staging server into master
  - Another person edits the chart A in production before ETL builds → collision!
  - There are two versions for chart A, we should resolve this.
- [ ] Maybe use existing old chart-approval flow for these conflicts?

Docs

[x] MVP documentation with new workflow. Update documentation to reflect data workflow
- guides to add, update data, update charts
- guides on wizard maintenance
- PR: https://github.com/owid/etl/pull/2741

Detected bugs

[x] Indicator upgrader showing indicators no longer in use
- Recreate: (i) run indicator upgrade & update charts, (ii) charts are visible from chart diff, (iii) if you go back to indicator upgrade, indicators no longer in use are still shown
[x] Chart diff is not rendering charts in staging. It seems as if charts recently updated in staging, can't be rendered? Endless spinner shown in the chart area.
[ ] Charts shown in chart diff might not reflect the latest available. If one goes from one wizard page to another and comes back, one might still need to click on 'refresh all charts'.

PR list

General:
- https://github.com/owid/etl/pull/2706
- https://github.com/owid/etl/pull/2725
- https://github.com/owid/etl/pull/2560
- https://github.com/owid/etl/pull/2741
- https://github.com/owid/etl/pull/2743
- https://github.com/owid/etl/pull/2763
- https://github.com/owid/etl/pull/2797
- https://github.com/owid/etl/pull/2800
- https://github.com/owid/etl/pull/2829
- https://github.com/owid/etl/pull/2818
Chart diff:
- https://github.com/owid/etl/pull/2752
- https://github.com/owid/etl/pull/2757
- https://github.com/owid/etl/pull/2771
- https://github.com/owid/etl/pull/2752
- https://github.com/owid/etl/pull/2787
- https://github.com/owid/etl/pull/2791
- https://github.com/owid/etl/pull/2802
- https://github.com/owid/etl/pull/2801 (used it to test some fixes)
- https://github.com/owid/etl/pull/2831
- https://github.com/owid/etl/pull/2882
- https://github.com/owid/etl/pull/2892
- https://github.com/owid/etl/pull/2895
Chart sync:
- https://github.com/owid/etl/pull/2615
- https://github.com/owid/etl/pull/2861
- https://github.com/owid/etl/pull/2890
Indicator upgrade:
- https://github.com/owid/etl/pull/2734
- https://github.com/owid/etl/pull/2595
- https://github.com/owid/etl/pull/2859

Apr 30 '24 13:04 lucasrodes

This plan sounds good! I just created a PR with a tool I was playing around with to be able to detect automatically the grapher dataset mapping old-> new based on your current branch. I think some of that logic can be implemented in indicator upgrader (I'm happy to do that myself down the line).

Also, some other part of the logic of that tool (finding all possible files affected by your local work) could also be part of chart diff, but that's just nice to have (for now chart diff should relay on chart config differences).

May 02 '24 13:05 pabloarosado

I just finished an update of WDI with about 500 charts. It was pretty smooth, but not perfect. Some observations:

Loading charts is slow on my ancient laptop. It's likely due to too many iframes being loaded. There might be a way how to avoid it, but I haven't tried it yet
About 100 charts were identical, but I still had to approve them
- The problem is that detecting if charts are identical isn't as trivial as it first seems
Having Reject button is necessary
When someone updates a chart in production, I have to manually make those changes in admin on staging and then approve. Having a "reset" button would help, but in the end manually editing a chart wasn't as bad
It's way better than chart revisions!!!

May 27 '24 14:05 Marigold

Minor thing: I think it would be convenient to show if a chart is a draft in chart diff.

May 29 '24 07:05 pabloarosado

Two nice to have ideas from doing large GBD review:

When sharing an issue with charts, I manually add links to both charts (example). Having a shareable URL to a single chart-diff would be nice (it could be a new page with chart slug in URL and show just the two charts)
Imagine you are updating 500 charts and then someone edits 10 of them. You can:
- Manually update all of them (with help of config diff)
- Reject those charts and after merging your PR, run indicator upgrader again in production (or create a new staging server and do all there)

Is there an easier way? Maybe we could have a "reset" button for a chart that would reset config from production, but keep variable ids. Not sure how hard it would be to implement.

Jun 03 '24 07:06 Marigold

Regarding the first point @Marigold made above, an alternative implementation could be to use the test page, in the format http://staging-site-{branch}/admin/test/embeds?ids={chart-id}&comparisonUrl=https%3A%2F%2Fourworldindata.org

I used that in a PIP issue:

Jun 04 '24 16:06 paarriagadap

[this list has been integrated into main issue list]

more points:

update documentation to reflect data workflow
- guides to add, update data, update charts
- guides on wizard maintenance
add a conflict resolver option: show as a form with fields that do bot match.
add the link to page at the top (easy to copy), if a subset of charts are presented
fix key bug for new charts
more info for each chart diff: dataset name, draft?
present modified and new charts together. filter them in the panel or via query params.
display state change as a plot

Jun 05 '24 22:06 lucasrodes

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Aug 25 '24 11:08 stale[bot]

etl etl copied to clipboard

wizard: improve new data workflow

One-liner

Context & details

Future work

Indicator Upgrade

Chart diff

After merging branch

Docs

Detected bugs

PR list

etl
etl copied to clipboard