Use better performance metrics
Just MAE or MSE or most normal performance metrics might show that optical flow performs better on average than an ML based approach, even if the ML model outperforms the optical flow in rarer situations or more complicated situations. So something like the MSE per hour, or something other than MSE over all examples might allow us to see where optical flow fails and ML models do well.
To combat 'blurry' predictions, we could also try things like the Mean Gradient Error (https://arxiv.org/pdf/1911.09428.pdf) (unfortunately the github repo implementation doesn't exist anymore) that tries to make sure the model learns sharp edges. The authors in that paper combined it with MSE.
It's possible that "blurry" predictions might still allow a downstream neural net to predict solar PV yield, and that the downstream neural net might learn to interpret the "blurry" predictions as low-confidence, which might be useful when the final PV yield prediction is probabilistic. So, yeah, "crisp" predictions would be awesome for human consumption; but I wouldn't worry toooo much if it's not possible to get super-crisp predictions :) Blurry predictions might still be great for predicting PV yield :)
One possible metric is the Structural Similarity Metric (SSIM), there is a PyTorch differentiable version here that I'm going to try out https://github.com/VainF/pytorch-msssim
Could also try using FID, which is correlated to human perception of visual quality, since we want these generated satellite images to look "natural" https://github.com/mseitzer/pytorch-fid
Working on adding some things from this review paper: https://arxiv.org/pdf/2004.05214.pdf one idea that seems promising for creating a model that can be used as an ensemble, is for the generation to be probabalistic, make a few predictions, and then use the one that matches the ground truth the most for calculating the loss and training. Total variation loss has also been added now, and working on doing some more.
LPIPS is another one too https://github.com/S-aiueo32/lpips-pytorch
Try it with only calculating the loss on parts of the image stack where things have moved over a certain amount https://github.com/openclimatefix/satflow/issues/88#issuecomment-910233923
Newer loss functions are being put in https://github.com/openclimatefix/nowcasting_utils to share between predict_pv_yield and SatFlow