etl
etl copied to clipboard
wizard: indicator upgrader
We are working on a new data workflow. A core part of it will be the Indicator Upgrader, which will help us migrate indicators from an old dataset to a new one in a smoother way than the current Chart Upgrader.
The new features of this step are:
- Automatic detection of new datasets. The new datasets are listed in the
new dataset dropdown
. - Smart matching between old and new datasets. We match old datasets with new datasets using
VersionTracker
to detect previous versions. We also have improved the similarity score between them usingrapidfuzz
. - Charts are automatically updated. E.g. no need for revision. (this tool should only be ran in staging. it is actually disabled in production).
- Improved general layout of the tool: placed ignore checkboxes to the left, more view options (i.e. display dataset names as ETL paths instead of titles), etc.
Goal of this issue
We have an MVP ready:
- https://github.com/owid/etl/pull/2595
The goal of this issue is to gather feedback on the new tool, including bugs to fix and feature proposals.
I've addressed some improvements for the Indicator upgrader in https://github.com/owid/etl/pull/2595. Still need to test it for different setups and see how it works there:
- [x] new grapher step(s), with previous version, pushed to Grapher
- [ ] new grapher step(s), with previous version, not pushed to Grapher
- [ ] new grapher step(s), without previous version, pushed to Grapher
- [ ] new grapher step(s), without previous version and not pushed to Grapher
- [ ] no change
TODOs:
- [x] Several datasets shown in the "new datasets" selectbox besides the one I added. It looks like recent added datasets are also shown.
- [x] I think this might be because the branch is not in sync with live. It is actually detecting something new in master, not in the branch. The tool should only detect new additions in the branch, and not in master!
- [x] Selected new dataset is changed when "show step names" option is enabled. Reproduce: choose dataset A → enable option → see change in selected new dataset (but not in old!)
- [ ] Fuzzy-match indicators based on short_names (instead of indicator titles)
- [ ] After migrating a subset of the indicators (i.e. charts are no longer using old versions), if I run indicator upgrader again, these indicators are still shown.
- [x] When submitting (3/3), the text says "submitting chart revisions". Should be sth like "updating charts" instead.
More TODOs:
- [x] On "Explore mode", there are multiple bugs:
- [x] Score is often
-inf
- [x] Chart with error distribution is shown twice
- [x] Relative error can be more illustrative. Positive relative errors (i.e. increases) are unbounded (i.e. can be as high as they are). However, for decreases, the error is currently bounded to -100% (i.e. any number → 0 is a 100% decrease). We need something to also clearly show negative values. Maybe log?
- [x] Score is often
More improvements:
- [x] Download data for "explore mode" in parallel when creating
steps_df
- [x] Re-arrange the UX so that it is more obvious that the dropdown in 'old dataset' depends on selection from 'new dataset'. For instance, put them vertically-aligned; first 'new dataset', then 'old dataset'
Bug:
-
[x] If you clicks twice on "Next (1/3)", the following error is shown. Instead, nothing should be triggered.
StreamlitAPIException: Values for st.button, st.download_button, st.file_uploader, st.data_editor, st.chat_input, and st.form cannot be set using st.session_state. Traceback: File "/home/owid/etl/apps/wizard/pages/indicator_upgrade/app.py", line 128, in <module> indicator_config = ask_and_get_indicator_mapping(search_form) File "/home/owid/etl/apps/wizard/pages/indicator_upgrade/indicator_mapping.py", line 338, in ask_and_get_indicator_mapping old_var_selectbox, ignore_selectbox, new_var_selectbox = st_mapping_manual( File "/home/owid/etl/apps/wizard/pages/indicator_upgrade/indicator_mapping.py", line 207, in st_mapping_manual show_explore = grid_indicators_manual.button(
More:
- [x] Now the 'old dataset' selectbox is updated based on selection on 'new dataset' selectbox. Show a spinner while this change is ocurring. The goal is to avoid the user interacting with the 'old dataset' before it gets automatically updated.
In case it happens again:
When using indicator upgrader on the annual climate change impacts, when clicking on "Update charts", I got:
Something went wrong! 502 Server Error: Bad Gateway for url: http://staging-site-update-climate-change-data/admin/api/charts/7568
I just refreshed, tried again, and this time it worked.
NOTE: My intuition is that this happened because the staging server was not totally built, so it may not be an issue with indicator upgrader.
Feature request: I had to update two grapher datasets (annual and monthly climate change impacts). When accessing indicator upgrader for the first time, it conveniently had the annual climate change data already selected. But once I had already updated this one, I refreshed the page, and the annual climate change data was selected again. It would be more useful if a different dataset (that hasn't been mapped yet) was selected automatically.
Feature request: It would be good if indicator upgrader worked in local grapher, if it's not too much additional work.
Feature request: It would be good if indicator upgrader worked in local grapher, if it's not too much additional work.
Currently, chart-diff and indicator upgrader are not supporting local development. I haven't look into this into detail, but I expect that chart-diff is more difficult to adapt to local development than indicator-upgrader.
Is there a use-case where you might want to have indicator-upgrader run locally but then not run chart-diff locally?
cc. @pabloarosado
I was updating all the climate steps, for which I used the dashboard. Then I wanted to archive old steps. But the archiving tool relies on what's on grapher. Therefore, I tried to upgrade my local datasets, but indicator upgrader didn't work locally. One solution is to do everything on staging (including updating and archiving).
@lucasrodes Can I suggest you make new issues for anything remaining that you care about, then close this one?