nowcasting_dataset icon indicating copy to clipboard operation
nowcasting_dataset copied to clipboard

Create a proportion of examples without PV data, outside the UK

Open JackKelly opened this issue 3 years ago • 5 comments

We currently only have PV data for the UK. We will at some point want to get PV data for elsewhere but, in the meantime, we'll need nowcasting_dataset to optionally output examples from outside the UK (to train the "image prediction" part of the model on the entire geospatial extent of the satellite imagery).

Maybe we should create two sets of batches on disk: one set which always has PV data (and is over the UK), and another set which is always from outside the UK (and doesn't have PV). Then the ML training script can mix-and-match examples on the fly to vary the ML training curriculum. To keep each batch ballanced, the ML training script will need to load at least two batches at once from disk (one with PV data, the other without) and create a single batch with a mixture of examples.

JackKelly avatar Sep 05 '21 10:09 JackKelly

Not sure this is essential. For WP1 there are some GSP where there are very few / No PV systems i.e Scotland

peterdudfield avatar Sep 22 '21 17:09 peterdudfield

@jacobbieker in order to train your models in SatFlow, do you think it's essential for the dataset to include training examples from outside the UK? (these examples wouldn't have any PV data yet...)

JackKelly avatar Sep 25 '21 16:09 JackKelly

Its probably not essential, and for a model that will primarily be focused on the UK for now anyway, it probably doesn't matter as much!

jacobbieker avatar Sep 27 '21 07:09 jacobbieker

Ill remove this from the NG project. Just to keep things really high priority in there

peterdudfield avatar Sep 30 '21 13:09 peterdudfield

This could probably be done as part of #202

JackKelly avatar Oct 07 '21 08:10 JackKelly