[Paper] Aurora Weather Foundation Model
Arxiv/Blog/Paper Link
https://www.microsoft.com/en-us/research/blog/introducing-aurora-the-first-large-scale-foundation-model-of-the-atmosphere/
Detailed Description
They used 3D Swin transformer to encode data, trained on era5, gfs, cmip6, and more. They then used 3D perceiver transformer for the processing and decoding, as well as LoRA in the fine tuning stages for longer rollout.
One of the few ones not trained just on ERA5.
Context
Code has now been open sourced: https://github.com/microsoft/aurora
Can I work on this @jacobbieker ?
Yep! That would be great!
@jacobbieker, Could you provide a bit more background on this issue? It would help to determine an approach and a model for implementation.
Hey, @jacobbieker We can consider two potential approaches for making predictions with the Aurora model:
- Autoregressive Roll-Outs
This approach is ideal for multi-step forecasting, particularly useful for long-term weather forecasting or other scenarios requiring iterative predictions.
from aurora import rollout import torch model = model.to("cuda") with torch.inference_mode(): preds = [pred.to("cpu") for pred in rollout(model, batch, steps=10)]Here, aurora.rollout is used for autoregressive rollouts. In the list comprehension, predictions are immediately moved to the CPU after every step to prevent GPU memory buildup.
Every element of
predsis an instance ofaurora.Batch.Recommended Models:
Aurora 0.25° Fine-TunedAurora 0.1° Fine-Tuned
- One-Step Predictions
This approach is more suitable for testing and validating the model's short-term prediction capabilities.
from datetime import datetime import torch from aurora import Batch, Metadata batch = Batch( surf_vars={k: torch.randn(1, 2, 17, 32) for k in ("2t", "10u", "10v", "msl")}, static_vars={k: torch.randn(17, 32) for k in ("lsm", "z", "slt")}, atmos_vars={k: torch.randn(1, 2, 4, 17, 32) for k in ("z", "u", "v", "t", "q")}, metadata=Metadata( lat=torch.linspace(90, -90, 17), lon=torch.linspace(0, 360, 32 + 1)[:-1], time=(datetime(2020, 6, 1, 12, 0),), atmos_levels=(100, 250, 500, 850), ), )Recommended Models:
Aurora 0.25° PretrainedAurora 0.25° Pretrained Small
Hey, for this issue, the idea is more to reimplement the Aurora model in this repo, and make it work with the same input/outputs as the other models here. The idea behind this is so that we can switch and swap different components from different papers/models. So we wouldn't want just their inference code or such, but would want the implementation of the Aurora components here as well. An example of this is the GenCast code or Fengu-GHR that have both been added here.
Proposed Implementation Approach for Aurora:
-
Core Architecture
- 3D Swin Transformer + Perceiver + 3D Decoder
- Modular design for component swapping with existing models
- LoRA integration for efficient fine-tuning
-
Data Pipeline
- Unified preprocessing for ERA5/GFS/CMIP6
- Standard pressure levels and grid resolution
- Normalized weather variables
-
Integration Strategy
- Compatible interfaces with GenCast/Fengu-GHR
- Swappable encoder/processor/decoder
- Standardized I/O formats
-
Evaluation
- Weather-specific metrics (RMSE, ACC)
- Skill scores vs climatology
- Performance benchmarking
Questions for discussion:
- Priority order for implementation?
- Preferred grid resolution for initial release?
- Additional metrics needed beyond standard set?
These are the things I've planned out for this implementation of Aurora. Is there any specific thing you would like to add? @jacobbieker
@jacobbieker
Just a quick note, there is no need to comment just to tag me, I'll respond to comments when I can. Adding comments just to tag me can clutter up the issue a little.
For this implementation, I don't think you need to make the data pipeline, its more the architecture we are interested in. So sounds good for 1., we can skip 2.
For 3. the inferface should be compatible with houw GenCast/Fengu-GHR does it, which is compatible with how the rest of this repo does it. You shouldn't need to worry about the I/O formats, as long as the implementation can take the same input shapessetup as those two, it'll work fine.
For 4. this would be a different set of PRs/issues to work on. It would be good to have metrics on how well it does, and we would like to train the implementation, but that I think would be more a follow on, or verfication of the model architecture being correct. What I would suggest is adding unit tests and integration tests as part of adding the implementation to ensure the individual components and whole implementation works.
- The priority is the core architecture, which as part of that 3. is done. Evaluation can come last, after its implemented.
- It should be flexible to take arbitrary resolutions, but I would aim for testing at the resolutions from the paper.
- We'll want some more metrics, but those would be handled from other repos.