satflow icon indicating copy to clipboard operation
satflow copied to clipboard

Use better performance metrics

Open jacobbieker opened this issue 4 years ago • 8 comments

Just MAE or MSE or most normal performance metrics might show that optical flow performs better on average than an ML based approach, even if the ML model outperforms the optical flow in rarer situations or more complicated situations. So something like the MSE per hour, or something other than MSE over all examples might allow us to see where optical flow fails and ML models do well.

jacobbieker avatar Jun 03 '21 13:06 jacobbieker

To combat 'blurry' predictions, we could also try things like the Mean Gradient Error (https://arxiv.org/pdf/1911.09428.pdf) (unfortunately the github repo implementation doesn't exist anymore) that tries to make sure the model learns sharp edges. The authors in that paper combined it with MSE.

jacobbieker avatar Jun 22 '21 08:06 jacobbieker

It's possible that "blurry" predictions might still allow a downstream neural net to predict solar PV yield, and that the downstream neural net might learn to interpret the "blurry" predictions as low-confidence, which might be useful when the final PV yield prediction is probabilistic. So, yeah, "crisp" predictions would be awesome for human consumption; but I wouldn't worry toooo much if it's not possible to get super-crisp predictions :) Blurry predictions might still be great for predicting PV yield :)

JackKelly avatar Jun 22 '21 11:06 JackKelly

One possible metric is the Structural Similarity Metric (SSIM), there is a PyTorch differentiable version here that I'm going to try out https://github.com/VainF/pytorch-msssim

jacobbieker avatar Jul 22 '21 13:07 jacobbieker

Could also try using FID, which is correlated to human perception of visual quality, since we want these generated satellite images to look "natural" https://github.com/mseitzer/pytorch-fid

jacobbieker avatar Aug 17 '21 07:08 jacobbieker

Working on adding some things from this review paper: https://arxiv.org/pdf/2004.05214.pdf one idea that seems promising for creating a model that can be used as an ensemble, is for the generation to be probabalistic, make a few predictions, and then use the one that matches the ground truth the most for calculating the loss and training. Total variation loss has also been added now, and working on doing some more.

jacobbieker avatar Sep 01 '21 09:09 jacobbieker

LPIPS is another one too https://github.com/S-aiueo32/lpips-pytorch

jacobbieker avatar Sep 01 '21 10:09 jacobbieker

Try it with only calculating the loss on parts of the image stack where things have moved over a certain amount https://github.com/openclimatefix/satflow/issues/88#issuecomment-910233923

jacobbieker avatar Sep 01 '21 12:09 jacobbieker

Newer loss functions are being put in https://github.com/openclimatefix/nowcasting_utils to share between predict_pv_yield and SatFlow

jacobbieker avatar Sep 17 '21 17:09 jacobbieker