nowcasting_dataset
nowcasting_dataset copied to clipboard
Implement `PVPhysicsPredictionDataSource`
Detailed Description
For all timesteps, and for all PV systems in the region of interest, include:
- Two sets of predicted PV power using
pvlib
's physical PV prediction. Use the PV system orientation metadata (if available).:- Use NWP predictions (using an NWP init time at or before
t0
). - Use clearsky
- ~~Maybe experiment with manually mapping from the inverter make and manufacturer in the metadata to pvlib's specifications.~~ UPDATE: I'm not sure the effort is worth payback.
- Use NWP predictions (using an NWP init time at or before
- ~~The max actual PV power for each time of interest from the last 2 weeks.~~
- ~~Need to do some experimentation to check if 2 weeks is a good time. It might be better to find the max for a given sun angle.~~
- ~~This is useful for 2 reasons:~~
- ~~To create a "shading-aware" physics based PV forecast:
min(pvlib_forecast(t), max_pv_power_for_last_2_weeks(t))
.~~ - ~~
actual_pv_power(t) / max_pv_power_for_last_2_weeks(t)
should tell us what proportion of sunlight is being blocked by clouds.~~ UPDATE: I think the PV power production signal is too noisy to use simple approaches like this to model shading. Instead, I think we should train an ML model to handle shading.
- ~~To create a "shading-aware" physics based PV forecast:
- The angle of the sun
- The azimuth of the sun (unless the
Sun
data source already includes this information for each PV system). This data is useful for a simple ML model that takes the above inputs and estimates the residual of the pvlib's forecast for each PV system. - NWP variables for each PV system (maybe interpolated to 5 minutely)
Maybe use quite long history and forecast durations. Maybe 2 days of forecast and 2 days of history?
Also include:
- The max actual PV power for the last 12 months (this is probably what we should use to rescale PV power to [0, 1]. Using the max across the entire timeseries won't capture panel degradation etc.)
Before building the data source, do some experiments in a Jupyter Notebook:
- Try computing all of the above and see how well it performs as a PV forecast. If nothing else, this is all a useful baseline algorithm.
- Try extending PVLib to consume UKV NWP.
- Experiment with a simple (boosted regression tree?) model which predicts the residual.
- can we reliably see shading from the last 2 weeks of data? What if the last two weeks was dull weather? Maybe better to compute a "shading plot" (a scatter plot of (actual PV power / expected PV power) vs solar angle) for multiple sun azimuth angles, and show this plot to an ML model. Or fit a curve to the shading plot.
Context
As discussed in https://github.com/openclimatefix/power_perceiver/issues/7, I'm now thinking of predicting PV as a chain of models, each of which predicts the residuals of the previous model.