One-liner

Define our ETL workflow for Explorers and MDIMs while unifying tooling as much as possible.

(previous context: https://github.com/owid/etl/issues/3969)

Context: MDIM vs Explorers

We have different kinds of similar objects in etl/owid-content: See this spreadsheet ↗

While we want to adopt more and more MDIM pages, we will still have explorers around. This is because both objects are, conceptually, different things:

MDIM: It is a data page, which, like any other data page, speaks about one specific indicator. The only difference is that, in the MDIM case, the indicator has multiple dimensions.
Explorer: Can host multiple indicators with different meanings.

Therefore, we need to improve the data workflow to support both products.

Goals

1. MDIMs and Explorers should come from ETL

Given the context explained above, and after various discussions, we agree that we should move towards having both explorers and MDIMS be ETL-based (export://explorers/ and export://multidim/, respectively).

NOTE: Ideally, the explorer config should live in a table in DB (similar to the multi_dim_data_pages table) instead of a tsv file in owid-content (but this is a separate issue).

Migrate explorers (one-off)
- All explorers that live in owid-content should be generated automatically from ETL export://explorers steps.
  - https://github.com/owid/etl/pull/4071
- All CSV-based explorers should be converted into indicator-based explorers.
  - https://github.com/owid/owid-issues/issues/1850
  - https://github.com/owid/etl/issues/4072
Explorers as MDIMs?
- Some explorers may be converted into multidim pages when appropriate.
- Are there any specific explorers with low-hanging fruits to convert into mdims?

2. Standardize the tooling used in explorers and MDIMs

These two objects are very similar, and ideally, they should rely on standard tooling to minimize the maintenance burden. This implies some additional transition work in the coming months.

Are there functions already developed for MDIM pages that could be reused in existing indicator-based explorers?
- https://github.com/owid/etl/issues/4032
- https://github.com/owid/etl/pull/4035

3. Create a pleasant workflow experience for data scientists

Wizard: We should have a nice workflow in Wizard, where data scientists can easily create export steps (only explorers/mdims) from a generic template. Just as we can easily create data and snapshot steps from Wizard, we should be able to do the same for MDIMs and explorers.
- #3980
Schema/docs: We should validate the schemas used for MDIMs and Explorers. An idea is to better structure our config information in pythonic data classes. At the same time, should power our docs (as with data and snapshots)
- #3976
- #3979
- #3977
Dimensions:
- https://github.com/owid/etl/issues/4007
- #4107
Update workflow
- #3956
Other issues
- #3981

Feb 17 '25 15:02 pabloarosado

I spent a good chunk of time browsing various explorers, and whoa... this isn't going to be easy. It feels like every explorer is unique, and there's no obvious way to have a single approach for everything. The only thing I can confidently say is that CSV-based explorers are bad (though that alone doesn’t justify spending time migrating them).

I'm still wrapping my head around everything, so take the following notes with a grain of salt.

1. MDIMs and Explorers Should Come from ETL

The main question is whether we'd allow editing explorers from Admin or not. If yes, we'd need either some kind of "override" in the Admin layer (either in owid-content or the DB) or a way to write changes back to ETL. (Remember that we did this for indicator metadata, and it's used very rarely.)

Explorers with many combinations, like minerals, are well suited for ETL, but more bespoke explorers, like migration, are much more complex. Then again, some people prefer YAML, while others prefer Python, and it's unclear whether we should enforce a single approach.

2. Standardize the Tooling Used in Explorers and MDIMs

@lucasrodes has already done this with the COVID explorer and COVID MDIM. The explorer YAML representation is really close to MDIMs. I can imagine generating a similar config file that could power both MDIMs and (indicator-based) explorers. If we can make it work for COVID, where we’re already pretty close, then it should be doable for anything. But does this grand unification bring enough value?

I guess we need a couple more MDIMs to better decide where to put our energy.

Appendix

Some explorers I found interesting:

Water and Sanitation – CSV-based explorer, could be worth migrating to indicator-based.
Monkeypox – CSV-based explorer, more bespoke. Could it be migrated to ETL, and would it be worth it?

Feb 20 '25 09:02 Marigold

Thanks for the summary, @Marigold! You touch on very valid points.

Just to disclose my bias up front, my dream is to migrate all explorers and have a standardized way of doing things in the MDIM/explorer space, as we have for data steps.

My take is that this might not provide much value in the short term, but it will in the long term. I'm especially concerned with the update flow, where I think we should assume that everything is ETL-powered. So I don't think this is super urgent, yet a goal that would be great to have in, say, 1-2 year time.

In general, I think that deprecating CSV-based explorers (and chart-based) will help us maintain our infrastructure in the long run. It's annoying when developing tools to account for all these edge cases that do not come from ETL.

1. MDIMs and Explorers Should Come from ETL

I think we should probably create an issue with all explorers and rank them somehow by type or complexity. Also, whenever attempting to "migrate" one, we should advertise it to avoid conflicts with other edits.

One risk here is that the data scientist in charge of this explorer might be used to their current pipeline, so we should make sure that the new indicator-based is easy to understand and with appropriate tooling. I think it could make sense to do this after agreeing on some templating (as in MDIMs) in point 2 below.

2. Standardize the Tooling Used in Explorers and MDIMs

I am happy to look at the COVID explorer again and see how the MDIM tooling/approach can be applied there.

I think that we could possibly need some engineering work here, to add some of the features that we have on MDIMs now (being able to reference them by catalogPath, display settings per view, etc.) Basically, it'd be nice to improve the explorer config API on the engineering side and align it with MDIMs a bit.

Feb 20 '25 09:02 lucasrodes

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jul 18 '25 09:07 stale[bot]

etl
etl copied to clipboard

Tracking: roadmap for explorers and mdims

One-liner

Context: MDIM vs Explorers

Goals

1. MDIMs and Explorers should come from ETL

2. Standardize the tooling used in explorers and MDIMs

3. Create a pleasant workflow experience for data scientists

1. MDIMs and Explorers Should Come from ETL

2. Standardize the Tooling Used in Explorers and MDIMs

Appendix

1. MDIMs and Explorers Should Come from ETL

2. Standardize the Tooling Used in Explorers and MDIMs

etl etl copied to clipboard

Tracking: roadmap for explorers and mdims

One-liner

Context: MDIM vs Explorers

Goals

1. MDIMs and Explorers should come from ETL

2. Standardize the tooling used in explorers and MDIMs

3. Create a pleasant workflow experience for data scientists

1. MDIMs and Explorers Should Come from ETL

2. Standardize the Tooling Used in Explorers and MDIMs

Appendix

1. MDIMs and Explorers Should Come from ETL

2. Standardize the Tooling Used in Explorers and MDIMs

etl
etl copied to clipboard