etl
etl copied to clipboard
📊 Add USGS minerals data
Main changes
Sorry, this PR has snowballed a bit! Here's a summary of the main changes:
Data changes
- Added USGS data on minerals.
- Fixed various issues on BGS data on minerals.
- Created steps to combine both, and build an explorer.
- Moved steps about minerals from
energy
to a newminerals
namespace (with its own dag yaml file).
ETL changes
- Added an
--explorer
flag toetl run
, that creates an explorer and writes atsv
file toowid-content
. We discussed this option in the data-architecture call and agreed that it's a move in the right direction. The implementation is not optimal, though:- Currently,
--explorer
sets an environment variable to 1. It would be better to handle this differently. - In theory, the new explorers step depends on the
grapher://grapher
step, so that's what should be added to the dag. However, this fails becauseGrapherStep
does not have achecksum_output()
method implemented, which makes the ETL break. For now, I used thedata://grapher
step as a dependency instead.
- Currently,
- Improved the
Explorer
object, so that it can handle catalog-path-based explorers. It can also easily translate from variable-id to catalog-paths (so we could use it for all existing indicator-based explorers). - Added an
EXPLORERS_DIR
variable to.env.example
. This variable is necessary if you want to create explorers from ETL, and your localowid-content
is not in the same folder asetl
(unlikely).
For reviewers
This PR is not yet ready to be merged, but I expect that more and more small issues related to minerals data are going to come up. So it would be better to merge this PR soon, and then have additional PRs to fix issues and improve metadata.
- @Marigold could you please review the ETL changes (
etl/command.py
,etl/config.py
,etl/explorer_helpers.py
,etl/helpers.py
)? - @lucasrodes I'm going to keep fixing things in the coming days. But if you don't mind, please have a general look at the rest of the data-related changes (as you did with some of the files already) and let me know if you see any major issues. No need for a thorough review (and I still need to improve metadata significantly). If you want to have a look at the resulting explorer, you can preview it (and untick the "Display locally edited explorer" option on the top-right). But again, it's not yet fully ready, I'll talk to Hannah about various issues.