Add MetNet-3
Arxiv/Blog/Paper Link
https://arxiv.org/pdf/2306.06079v2.pdf
Detailed Description
MetNet-3 was released as a paper
Context
It would be better to have all versions of MetNet in this repo. This densified forecast could also be useful for the irradiance modelling, going from PV sites to a dense forecast.
They use dense and sparse inputs, as well as outputs.
- Use HRRR output to help with training, but don't actually look at the predictions
- Uses very large center context of 2500km, extra large area of 5000km, and forecasts for 24 hours
- Masks out 25% of sites per example, to help with densification
More notes:
- They use 942 weather stations, and train by assigning the values of the weather observations to the 4x4km pixel in which it lies. If there are multiple weather stations in a single pixel, they average the values.
- Evaluation they don't give any past data from the eval weather stations, so its the same as any other grid point to compare against. They then also give eval weather station history to give hyperlocal forecasts which are a decent improvement (could be very useful for site level forecasts)
- Inputs include a topographical embedding, instead og directly giving topo map or land/sea mask: grid of 4km stride, with 20 parameters per grid point: "For each input example, wecalculate the topographical embedding of each input pixel center by bilinearly interpolating the embeddingsfrom the grid. The embedding parameters are trained together with other model parameters similarly toembeddings used in NLP."
Architecture:
"Data is then processed by a U-Net backbone, which starts with applying two convolutional ResNetblocks [9] and downsampling the data to 8 km resolution. We then pad the internal representation spatiallywith zeros to 4992 km by 4992 km square and concatenate with the low-resolution, large-context inputs.Afterward, we again apply two convolutional ResNet blocks and downsample the representation to 16 kmresolution. Convolutional ResNet blocks can only handle local interactions and for longer lead times closeto 24 hours, the targets may depend on the entire input. In order to facilitate that, we process the dataat 16 km resolution using a modified version of MaxVit [22] network. MaxVit is a version of Vision Trans-former (ViT, [6]) with attention over local neighbourhood as well as global gridded attention. We modifythe MaxVit architecture by removing all MLP sub-blocks, adding skip connections (to the MaxVit output)after each MaxVit sub-block, and using normalized keys and queries in attention [5].Afterwards, we take the central crop of size 768 km by 768 km, and gradually upsample the representationto 4 km resolution using skip connections from the downsampling path, at which point we again take acentral crop, this time of size 512 km by 512 km. The network outputs a categorical distribution over 256bins for each of 6 ground weather variables and a deterministic prediction for each of 617 assimilated weatherstate channels using an MLP with one hidden layer applied to the representation at 4 km resolution. Forprecipitation (both instantaneous rate and hourly accumulation), we upsample the representation to 1 kmresolution and output for each pixel a categorical distribution over 512 bins. "
- Lead time is included by applying time embedding both additively and multiplicitvely to blocks, same as MetNet-2
- Forecast lead time for training isn't same across lead times, it follows an exponential drop off, with t=0 having 10 times the probability of being shown vs t=24h
- Trained on cross-entropy loss, after rescaling losses to be similar magnitudes. MSE for forecast on HRRR assimilation state, although those predictions weren't looked at, they just helped training
Author Notes:
- Tradeoff in performance for precipitation forecast vs ground variables, improving one resulted in decreasing performance for the other
- To work with this, trained primarily percipitation model, then "afterwards we increase the weight of the OMO loss by100x compared to the precipitation model and finetune the model. Moreover, we disable topographicalembedding (fix them to zeros) for this OMO-specific model because topographical embedding may hindertransfer between different locations, which is crucial for learning only from targets present at a sparse set of locations."
- Loss scaling
ASOS 1 minute weather data (public and freely accessible): https://madis.ncep.noaa.gov/madis_OMO.shtml
Also, they mention that MetNet-3 is being used for operational forecasts in Google Search already
Sounds great! Well done for spotting this publication!
MetNet-3 uses a modified MaxViT model in the centre of the U-Net. Here's the MaxViT paper. The MaxViT authors have also released TensorFlow code. But, TBH, MaxViT sounds so simple that it's probably easier to re-implement MaxViT in PyTorch directly from the MaxViT paper :slightly_smiling_face:
Yeah, timm also has an implementation of MaxViT as well in Pytorch, we could either use or base ours off of it
Found a website that has weather station data for the whole world, and easily downloadable, including UK, and other countries https://github.com/akrherz/iem/blob/main/scripts/asos/iem_scraper_example.py from https://mesonet.agron.iastate.edu/request/download.phtml?network=GB__ASOS
Found an implementation of Metnet 3: https://github.com/lucidrains/metnet3-pytorch
Here is another implementation, already finished: https://github.com/kyegomez/metnet3