graph_weather [Paper] Aurora Weather Foundation Model

Arxiv/Blog/Paper Link

https://www.microsoft.com/en-us/research/blog/introducing-aurora-the-first-large-scale-foundation-model-of-the-atmosphere/

Detailed Description

They used 3D Swin transformer to encode data, trained on era5, gfs, cmip6, and more. They then used 3D perceiver transformer for the processing and decoding, as well as LoRA in the fine tuning stages for longer rollout.

One of the few ones not trained just on ERA5.

Context

Jun 03 '24 17:06 jacobbieker

Code has now been open sourced: https://github.com/microsoft/aurora

Aug 23 '24 08:08 jacobbieker

Can I work on this @jacobbieker ?

Jan 22 '25 02:01 yuvraajnarula

Yep! That would be great!

Jan 22 '25 09:01 jacobbieker

@jacobbieker, Could you provide a bit more background on this issue? It would help to determine an approach and a model for implementation.

Jan 24 '25 15:01 yuvraajnarula

Hey, @jacobbieker We can consider two potential approaches for making predictions with the Aurora model:

Autoregressive Roll-Outs

This approach is ideal for multi-step forecasting, particularly useful for long-term weather forecasting or other scenarios requiring iterative predictions.
from aurora import rollout  
import torch  

model = model.to("cuda")  

with torch.inference_mode():  
    preds = [pred.to("cpu") for pred in rollout(model, batch, steps=10)]  
Here, aurora.rollout is used for autoregressive rollouts. In the list comprehension, predictions are immediately moved to the CPU after every step to prevent GPU memory buildup.

Every element of preds is an instance of aurora.Batch.

Recommended Models:

Aurora 0.25° Fine-Tuned

Aurora 0.1° Fine-Tuned

One-Step Predictions

This approach is more suitable for testing and validating the model's short-term prediction capabilities.

from datetime import datetime  
import torch  
from aurora import Batch, Metadata  

batch = Batch(  
    surf_vars={k: torch.randn(1, 2, 17, 32) for k in ("2t", "10u", "10v", "msl")},  
    static_vars={k: torch.randn(17, 32) for k in ("lsm", "z", "slt")},  
    atmos_vars={k: torch.randn(1, 2, 4, 17, 32) for k in ("z", "u", "v", "t", "q")},  
    metadata=Metadata(  
        lat=torch.linspace(90, -90, 17),  
        lon=torch.linspace(0, 360, 32 + 1)[:-1],  
        time=(datetime(2020, 6, 1, 12, 0),),  
        atmos_levels=(100, 250, 500, 850),  
    ),  
)

Recommended Models:

Aurora 0.25° Pretrained
Aurora 0.25° Pretrained Small

Jan 27 '25 03:01 yuvraajnarula

Hey, for this issue, the idea is more to reimplement the Aurora model in this repo, and make it work with the same input/outputs as the other models here. The idea behind this is so that we can switch and swap different components from different papers/models. So we wouldn't want just their inference code or such, but would want the implementation of the Aurora components here as well. An example of this is the GenCast code or Fengu-GHR that have both been added here.

Jan 27 '25 09:01 jacobbieker

Proposed Implementation Approach for Aurora:

Core Architecture
- 3D Swin Transformer + Perceiver + 3D Decoder
- Modular design for component swapping with existing models
- LoRA integration for efficient fine-tuning
Data Pipeline
- Unified preprocessing for ERA5/GFS/CMIP6
- Standard pressure levels and grid resolution
- Normalized weather variables
Integration Strategy
- Compatible interfaces with GenCast/Fengu-GHR
- Swappable encoder/processor/decoder
- Standardized I/O formats
Evaluation
- Weather-specific metrics (RMSE, ACC)
- Skill scores vs climatology
- Performance benchmarking

Questions for discussion:

Priority order for implementation?
Preferred grid resolution for initial release?
Additional metrics needed beyond standard set?

These are the things I've planned out for this implementation of Aurora. Is there any specific thing you would like to add? @jacobbieker

Jan 30 '25 18:01 yuvraajnarula

@jacobbieker

Jan 31 '25 04:01 yuvraajnarula

Just a quick note, there is no need to comment just to tag me, I'll respond to comments when I can. Adding comments just to tag me can clutter up the issue a little.

For this implementation, I don't think you need to make the data pipeline, its more the architecture we are interested in. So sounds good for 1., we can skip 2.

For 3. the inferface should be compatible with houw GenCast/Fengu-GHR does it, which is compatible with how the rest of this repo does it. You shouldn't need to worry about the I/O formats, as long as the implementation can take the same input shapessetup as those two, it'll work fine.

For 4. this would be a different set of PRs/issues to work on. It would be good to have metrics on how well it does, and we would like to train the implementation, but that I think would be more a follow on, or verfication of the model architecture being correct. What I would suggest is adding unit tests and integration tests as part of adding the implementation to ensure the individual components and whole implementation works.

The priority is the core architecture, which as part of that 3. is done. Evaluation can come last, after its implemented.
It should be flexible to take arbitrary resolutions, but I would aim for testing at the resolutions from the paper.
We'll want some more metrics, but those would be handled from other repos.

Jan 31 '25 09:01 jacobbieker