etl
etl copied to clipboard
wizard: improve new data workflow
One-liner
Improvements to the new data update workflow.
Context & details
At high level, the new workflow is:
- Staging preparation
- Create branch, PR
- Data work
- Push changes
-
Indicator upgrader
- User maps indicators to their new versions (as much as possible should be automated).
- Charts are updated using the new indicators.
-
Chart diff: on-going revision of changes in charts
- Show all charts that have changed in staging. Changes can be due to: (i) indicator changes, (ii) changes in chart config (from admin), (iii) changes in indicator values (data work in ETL) and (iv) changes in indicator metadata (data work in ETL).
- When all charts are approved,
owidbot
should signal this in its comment in the PR - Once all looks fine, merge PR.
- After PR merge
- Charts are automatically updates in production.
- If there is any conflict, a manual revision is needed. Maybe use old chart approval here?
Note that there can be data work at any point between steps 1 and 2, hence chart-diff should also reflect chart changes due to
Future work
Indicator Upgrade
- [x] Smarter detection and presentation of new datasets:
- [x] Compare changes between
master
andnew-branch
- [x] Extract list of new grapher datasets
- [x] For each new grapher dataset, sort the list of datasets in "old dataset". Ideally should be: 1) older version(s) of dataset if possible, 2) fuzzy match of URIs.
- PR: https://github.com/owid/etl/pull/2595
- [x] Compare changes between
- [x] Present the various "dataset migrations"
- [x] In the "new dataset" dropdown, only list those datasets that are new. By choosing one there, one is specifying the "dataset migration" they want to make. Show first the biggest one? Show last those that are already in use (at least one chart uses them = migration previously done?)
- PR: https://github.com/owid/etl/pull/2595
- [ ] QA & other improvements
- [ ] PR: https://github.com/owid/etl/issues/2610
Chart diff
- [ ] What happens if: (i) approve all charts, (ii) edit one chart in admin from staging, (iii) merge the PR without going back to chart-diff. Is that edited chart approved and therefore pushed to production? Or is it tagged as unapproved?
- [ ] Support for deleted charts
- Maybe we could start by just signaling this.
- [ ] Add 'explore mode' to chart-diff. Now, if a chart-diff appears due to data or metadata change, it is hard to understand where the discrepancies are coming from. We could have an explore mode, similar to indicator upgrader's, where the user can compare the data and the metadata. The 'explorer mode' should be a multi-indicator one.
- [ ] Data comparison. Essentially the same as in indicator upgrader.
- [x] A diff view between the metadata from old and new indicators.
- [x] Support for draft charts
- [x] Show chart changes due to indicator changes in values or metadata (work in ETL). Currently chart diff only shows charts that have a different config compared to production. NOTE: as long as they've been edited via Admin API (admin page, or Admin API in python).
- PR: https://github.com/owid/etl/pull/2752
- [x] https://github.com/owid/etl/issues/2726
- [x] Stress test it: what happens if we have 100 charts? 200? Do we need pagination?
- [x] Run
chart diff
periodically against all open PRs in ETL and post results back with owidbot- [x] Is it too much to run everytime there is a change in the approval table?
- [x] ~Track indicator version updates. E.g. a table mapping indicator IDs (old → new)~ Not doing this for now.
- [x] Enable 'reset chart' in case you accidentally edited one chart that you should have not.
- Solved by
reject
button
- Solved by
- [x] Add a conflict resolver option: show as a form with fields that do bot match.
- https://github.com/owid/etl/pull/2771
- [x] ~Add the link to page at the top (easy to copy), if a subset of charts are presented~ Won't do
- [x] fix key bug for new charts
- [x] More info for each chart diff:
- [x] ~dataset name~ WON'T DO FOR NOW
- [x] draft?
- [x] Change type
- [x] Present modified and new charts together. filter them in the panel or via query params.
- [x] ~display state change as a plot (only if fast enough)~ good as-is
After merging branch
- [x] After merging the PR, there should be an automated chart-sync call (no need to run it manually)
- [x] PR: https://github.com/owid/etl/pull/2615
- [ ] Run it for real without
--dry-run
- [ ] Merge conflicts in production
- Example of conflict
- Data manager edits
chart A
on staging → they merge the PR from staging server into master - Another person edits the
chart A
in production before ETL builds → collision! - There are two versions for
chart A
, we should resolve this.
- Data manager edits
- [ ] Maybe use existing old chart-approval flow for these conflicts?
- Example of conflict
Docs
- [x] MVP documentation with new workflow. Update documentation to reflect data workflow
- guides to add, update data, update charts
- guides on wizard maintenance
- PR: https://github.com/owid/etl/pull/2741
Detected bugs
- [x] Indicator upgrader showing indicators no longer in use
- Recreate: (i) run indicator upgrade & update charts, (ii) charts are visible from chart diff, (iii) if you go back to indicator upgrade, indicators no longer in use are still shown
- [x] Chart diff is not rendering charts in staging. It seems as if charts recently updated in staging, can't be rendered? Endless spinner shown in the chart area.
- [ ] Charts shown in chart diff might not reflect the latest available. If one goes from one wizard page to another and comes back, one might still need to click on 'refresh all charts'.
PR list
- General:
- https://github.com/owid/etl/pull/2706
- https://github.com/owid/etl/pull/2725
- https://github.com/owid/etl/pull/2560
- https://github.com/owid/etl/pull/2741
- https://github.com/owid/etl/pull/2743
- https://github.com/owid/etl/pull/2763
- https://github.com/owid/etl/pull/2797
- https://github.com/owid/etl/pull/2800
- https://github.com/owid/etl/pull/2829
- https://github.com/owid/etl/pull/2818
- Chart diff:
- https://github.com/owid/etl/pull/2752
- https://github.com/owid/etl/pull/2757
- https://github.com/owid/etl/pull/2771
- https://github.com/owid/etl/pull/2752
- https://github.com/owid/etl/pull/2787
- https://github.com/owid/etl/pull/2791
- https://github.com/owid/etl/pull/2802
- https://github.com/owid/etl/pull/2801 (used it to test some fixes)
- https://github.com/owid/etl/pull/2831
- https://github.com/owid/etl/pull/2882
- https://github.com/owid/etl/pull/2892
- https://github.com/owid/etl/pull/2895
- Chart sync:
- https://github.com/owid/etl/pull/2615
- https://github.com/owid/etl/pull/2861
- https://github.com/owid/etl/pull/2890
- Indicator upgrade:
- https://github.com/owid/etl/pull/2734
- https://github.com/owid/etl/pull/2595
- https://github.com/owid/etl/pull/2859
This plan sounds good! I just created a PR with a tool I was playing around with to be able to detect automatically the grapher dataset mapping old-> new based on your current branch. I think some of that logic can be implemented in indicator upgrader (I'm happy to do that myself down the line).
Also, some other part of the logic of that tool (finding all possible files affected by your local work) could also be part of chart diff, but that's just nice to have (for now chart diff should relay on chart config differences).
I just finished an update of WDI with about 500 charts. It was pretty smooth, but not perfect. Some observations:
- Loading charts is slow on my ancient laptop. It's likely due to too many iframes being loaded. There might be a way how to avoid it, but I haven't tried it yet
- About 100 charts were identical, but I still had to approve them
- The problem is that detecting if charts are identical isn't as trivial as it first seems
- Having
Reject
button is necessary - When someone updates a chart in production, I have to manually make those changes in admin on staging and then approve. Having a "reset" button would help, but in the end manually editing a chart wasn't as bad
- It's way better than chart revisions!!!
Minor thing: I think it would be convenient to show if a chart is a draft in chart diff.
Two nice to have ideas from doing large GBD review:
- When sharing an issue with charts, I manually add links to both charts (example). Having a shareable URL to a single chart-diff would be nice (it could be a new page with chart slug in URL and show just the two charts)
- Imagine you are updating 500 charts and then someone edits 10 of them. You can:
- Manually update all of them (with help of config diff)
- Reject those charts and after merging your PR, run indicator upgrader again in production (or create a new staging server and do all there)
Is there an easier way? Maybe we could have a "reset" button for a chart that would reset config from production, but keep variable ids. Not sure how hard it would be to implement.
Regarding the first point @Marigold made above, an alternative implementation could be to use the test page, in the format
http://staging-site-{branch}/admin/test/embeds?ids={chart-id}&comparisonUrl=https%3A%2F%2Fourworldindata.org
I used that in a PIP issue:
[this list has been integrated into main issue list]
more points:
- update documentation to reflect data workflow
- guides to add, update data, update charts
- guides on wizard maintenance
- add a conflict resolver option: show as a form with fields that do bot match.
- add the link to page at the top (easy to copy), if a subset of charts are presented
- fix key bug for new charts
- more info for each chart diff: dataset name, draft?
- present modified and new charts together. filter them in the panel or via query params.
- display state change as a plot
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.