nowcasting_dataset
nowcasting_dataset copied to clipboard
Create a proportion of examples without PV data, outside the UK
We currently only have PV data for the UK. We will at some point want to get PV data for elsewhere but, in the meantime, we'll need nowcasting_dataset to optionally output examples from outside the UK (to train the "image prediction" part of the model on the entire geospatial extent of the satellite imagery).
Maybe we should create two sets of batches on disk: one set which always has PV data (and is over the UK), and another set which is always from outside the UK (and doesn't have PV). Then the ML training script can mix-and-match examples on the fly to vary the ML training curriculum. To keep each batch ballanced, the ML training script will need to load at least two batches at once from disk (one with PV data, the other without) and create a single batch with a mixture of examples.
Not sure this is essential. For WP1 there are some GSP where there are very few / No PV systems i.e Scotland
@jacobbieker in order to train your models in SatFlow, do you think it's essential for the dataset to include training examples from outside the UK? (these examples wouldn't have any PV data yet...)
Its probably not essential, and for a model that will primarily be focused on the UK for now anyway, it probably doesn't matter as much!
Ill remove this from the NG project. Just to keep things really high priority in there
This could probably be done as part of #202