the_od_bods icon indicating copy to clipboard operation
the_od_bods copied to clipboard

Move /data to a repo of its own

Open KarenJewell opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe. Currently all data is being stored in /data and updated with the weekly pipeline refreshes. This is the output location of all scrapers but also the input for eventual dataset listings on opendata.scot.

However, the constant change in /data is causing unnecessary merge conflicts in development even when the actual code base hasn't changed.

Describe the solution you'd like

  • [ ] Create a new repo for /data and use it as the storage location.
  • [ ] All source scrapers to write to new repo
  • [ ] merge_data.py to read from new repo
  • [ ] update pipeline to write to new repo
  • [ ] delete /data in the_od_bods

Describe alternatives you've considered The alternative is not to write and store any data at all in intermediary steps. But this is a big change in the way we process, not helpful for debugging and the loss of the intermediary step may be unhelpful for new contributors.

Additional context None

KarenJewell avatar Feb 18 '23 16:02 KarenJewell