etl
etl copied to clipboard
List the main person(s) working in each ETL step
Problem
We currently have no easy way to know who has worked in a particular ETL step or grapher dataset.
If, for example, something odd is found in the data, we don't know straight away who the contact person is, to tag in an issue or in slack.
Hence, it would be convenient to have that information both in the ETL dashboard and in the grapher admin.
Possible solution
Add a metadata field
One possible solution would be to have owid_authors (or some other name) as a new field (list of literals) in the snapshot metadata, which would be automatically propagated (by doing the union of authors). As with any other metadata field, this field could be manually overwritten at a later step.
Use git metadata
We could check who's committed to a step. It would have the benefit of filling this in historically, but we also have bots that commit, and many ingredients on the way to a step. But it also might pick up people who are not the main contributor, e.g. Mojmir gets tagged on EVERYTHING 🎉
What's in scope
- [ ] Create a metadata field
- [ ] Ensure it gets populated automatically (e.g. in Wizard, or from previous data version)
- [ ] Ensure that it propagates, e.g. from snapshot onwards
- [ ] Telling people about it
Out of scope
- Backfilling it somehow is just nice to have
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.