nowcasting_dataset icon indicating copy to clipboard operation
nowcasting_dataset copied to clipboard

Move our data download & conversion scripts into separate repos (preparing for production)

Open JackKelly opened this issue 4 years ago • 5 comments
trafficstars

Our code for downloading and converting satellite data lives in the Satip repo.

Our code for downloading and converting other data sources (NWP, GSP PV, etc.) lives in this repo.

Are we happy with this situation?

Or should we have separate repos for downloading and preparing each data source? Or maybe just for the "big" data sources (e.g. NWPs? And GSP?).

Advantages of splitting out the downloading-and-converting scripts into separate repos:

  • Reduces number of dependencies required by nowcasting_dataset
  • Makes it a little easier for other people to re-use our downloading-and-converting code for other purposes

Disadvantages:

  • Managing multiple repos can get time-consuming!
  • Some of the data sources literally just require a single conversion script (e.g. converting the PassivSystems PV data). Do we really want a whole repo for a single conversion script?

What do you guys think, @jacobbieker and @peterdudfield?

JackKelly avatar Nov 09 '21 15:11 JackKelly

Possibly a repo for all the download scripts, but I wouldn't really separate them all out into separate repos, as yeah, it gets a bit confusing keeping everything in sync. Would it reduce a lot of dependencies? I would think a decent amount of the dependencies would still need be used for the nowcasting-dataset

jacobbieker avatar Nov 09 '21 16:11 jacobbieker

It we look ahead to the live system, we a service for data source, in that case I think it will be good to have separate repos. There will be docker files, and separate tests, and infrastructure code, that are handled much easier in smaller chunks. Also versioning is a bit easier then., i.e we can update one data source at a time.

I would be tempted to leave them here for the moment, until we start building out these data_source 'consumers', and then that'll be a good time to separate things?

peterdudfield avatar Nov 09 '21 16:11 peterdudfield

It we look ahead to the live system, we a service for data source, in that case I think it will be good to have separate repos. There will be docker files, and separate tests, and infrastructure code, that are handled much easier in smaller chunks. Also versioning is a bit easier then., i.e we can update one data source at a time.

I would be tempted to leave them here for the moment, until we start building out these data_source 'consumers', and then that'll be a good time to separate things?

Yeah, that's a good point. So maybe leave this for later then? Also, if each one is then bundled with infra code, docker files, etc. it makes more sense as separate repos rather than currently where it would just be single files for a lot of them.

jacobbieker avatar Nov 09 '21 16:11 jacobbieker

It we look ahead to the live system, we a service for data source, in that case I think it will be good to have separate repos.

Ooh, that's an excellent point!

Cool. As you've both suggested, I agree that we should leave our data conversion scripts where they are for the remainder of 2021.

I've moved this issue into the "WP2" project.

JackKelly avatar Nov 09 '21 16:11 JackKelly

Good use of https://github.com/orgs/openclimatefix/projects/6

peterdudfield avatar Nov 09 '21 17:11 peterdudfield