nowcasting_dataset
nowcasting_dataset copied to clipboard
Add Rainfall Radar data
I've looked into it here for SatFlow: https://github.com/openclimatefix/satflow/issues/7 it seems we can read in the Nimrod files from the Met Office using iris
Its at a 1km for the UK, and 5km resolution for Europe, so not sure if we would want to resample it to the same resolution, or keep them at their native resolution and resample at a later step?
Sounds good to me!
Some more notes about precipitation data here: https://github.com/openclimatefix/nowcasting_dataset/issues/9
not sure if we would want to resample it to the same resolution, or keep them at their native resolution and resample at a later step
Hmm, good question!
When using models like Perceiver IO, I'm curious to try giving the model the raw (un-resampled) data because the model doesn't care about data being on a neat grid (and because resampling always introduces artefacts. So using the 'pure' data might improve performance. For some more discussion, see https://github.com/openclimatefix/predict_pv_yield/issues/64) But, then, we wouldn't be able to use the same dataset for models which do need data to be on neat grids (or for Perceiver with a CNN front-end). We could optionally re-sample on-the-fly during training, if it's not too expensive?
(in general, maybe we need a thin "data loading" wrapper which can optionally do some data transformation on-the-fly? Although that feels a little messier than doing 100% of the data processing ahead-of-time. But is probably necessary??).
Yeah, I think a thin data loading wrapper would be useful. Even for when I preprocess the data to train on and save to disk, being able to change exactly how many past steps/future steps to include is useful, for example. So, for example, saving to disk the last hour of processed timesteps and being allowed to change that to only the last half hour, etc. is nice to be able to see the effect of extra timesteps without needing to save one copy of the dataset for the previous hour, and another for the last half hour, etc. I did a little bit of that with https://github.com/openclimatefix/satflow/blob/2be1708bb37916d23ffcf18dd1a1cfd6a709acf5/satflow/data/datasets.py#L649 in a bit of a hacky way by manually putting in the slices, but its still fairly simple and was quite quick. Resampling seems to be somewhat compute intensive? At least for satellite imagery, but maybe its a bit easier for radar?
Discussion of "thin data-loading layer" here: #97