Move /data to a repo of its own
Is your feature request related to a problem? Please describe.
Currently all data is being stored in /data and updated with the weekly pipeline refreshes. This is the output location of all scrapers but also the input for eventual dataset listings on opendata.scot.
However, the constant change in /data is causing unnecessary merge conflicts in development even when the actual code base hasn't changed.
Describe the solution you'd like
- [ ] Create a new repo for
/dataand use it as the storage location. - [ ] All source scrapers to write to new repo
- [ ] merge_data.py to read from new repo
- [ ] update pipeline to write to new repo
- [ ] delete
/datainthe_od_bods
Describe alternatives you've considered The alternative is not to write and store any data at all in intermediary steps. But this is a big change in the way we process, not helpful for debugging and the loss of the intermediary step may be unhelpful for new contributors.
Additional context None