etl icon indicating copy to clipboard operation
etl copied to clipboard

📊 Add USGS minerals data

Open pabloarosado opened this issue 7 months ago • 5 comments

Main changes

Sorry, this PR has snowballed a bit! Here's a summary of the main changes:

Data changes

  • Added USGS data on minerals.
  • Fixed various issues on BGS data on minerals.
  • Created steps to combine both, and build an explorer.
  • Moved steps about minerals from energy to a new minerals namespace (with its own dag yaml file).

ETL changes

  • Added an --explorer flag to etl run, that creates an explorer and writes a tsv file to owid-content. We discussed this option in the data-architecture call and agreed that it's a move in the right direction. The implementation is not optimal, though:
    • Currently, --explorer sets an environment variable to 1. It would be better to handle this differently.
    • In theory, the new explorers step depends on the grapher://grapher step, so that's what should be added to the dag. However, this fails because GrapherStep does not have a checksum_output() method implemented, which makes the ETL break. For now, I used the data://grapher step as a dependency instead.
  • Improved the Explorer object, so that it can handle catalog-path-based explorers. It can also easily translate from variable-id to catalog-paths (so we could use it for all existing indicator-based explorers).
  • Added an EXPLORERS_DIR variable to .env.example. This variable is necessary if you want to create explorers from ETL, and your local owid-content is not in the same folder as etl (unlikely).

For reviewers

This PR is not yet ready to be merged, but I expect that more and more small issues related to minerals data are going to come up. So it would be better to merge this PR soon, and then have additional PRs to fix issues and improve metadata.

  • @Marigold could you please review the ETL changes (etl/command.py, etl/config.py, etl/explorer_helpers.py, etl/helpers.py)?
  • @lucasrodes I'm going to keep fixing things in the coming days. But if you don't mind, please have a general look at the rest of the data-related changes (as you did with some of the files already) and let me know if you see any major issues. No need for a thorough review (and I still need to improve metadata significantly). If you want to have a look at the resulting explorer, you can preview it (and untick the "Display locally edited explorer" option on the top-right). But again, it's not yet fully ready, I'll talk to Hannah about various issues.

pabloarosado avatar Jul 12 '24 14:07 pabloarosado