graph_weather icon indicating copy to clipboard operation
graph_weather copied to clipboard

era5 training to reproduce Keisler?

Open paapu88 opened this issue 2 years ago • 10 comments

Dear Developers,

Is there any code for training the network with era5 data to reproduce Keisler paper?

There seem to be scripts in graph_weather/train but all for gfs data?

Terveisin, Markus

paapu88 avatar Aug 05 '22 13:08 paapu88

Not currently, I've been slowly pushing ERA5 data to Huggingface here https://huggingface.co/datasets/openclimatefix/era5 and the code for training with GFS should be fairly simple to change to use ERA5. Would welcome any contributions to add an ERA5 training script! I will probably get around to it, just not super soon probably.

jacobbieker avatar Aug 05 '22 13:08 jacobbieker

  • Thanks for a quick update. I'll make a era5 fork and start to work on it.
  • I have downloaded era5 data 1959-2022 with a 6 hour intervall in 1deg grid to Oracle cloud. I took some extra variables compared to Keisler. I can copy that data to somewhere if that would help (29 GB/year)
  • I'll not close this, because I plan to ask era5-training related questions here...

paapu88 avatar Aug 06 '22 08:08 paapu88

One great place would be HuggingFace Datasets, its free hosting, and has some nice features, as well as then you should be able to just take the GFS training script, change the HF dataset path, and it should roughly work. If you are hosting it somewhere else, also would be happy to rehost it on OCFs HF account and add the data loading script for it. The ERA5 data I'm pushing to that dataset right now is hourly, all variables dataset, so is very large, and the plan is to only go back to 2016 with it for now, so having a smaller one that goes back to 1959 would be fantastic!

jacobbieker avatar Aug 06 '22 09:08 jacobbieker

Ok, I'll start pushing era5 6h data to HuggingFace, need to read docs first...

paapu88 avatar Aug 06 '22 09:08 paapu88

@all-contributors please add @paapu88 for question

peterdudfield avatar Sep 13 '22 14:09 peterdudfield

@peterdudfield

I've put up a pull request to add @paapu88! :tada:

allcontributors[bot] avatar Sep 13 '22 14:09 allcontributors[bot]

@paapu88 @jacobbieker - Hi Markus and Jacob - were you able to get this to train well on the ERA5 data? We're also trying to fit the GNN to 1-degree ERA5 in an attempt to reproduce Keisler's results. Limited progress so far, the loss plateaus immediately and only marginally improves over persistence. We're debugging it now, but any tips / tricks (loss normalization, etc.) as to what may improve things?

Cheers, ~Mihai

mishooax avatar Sep 13 '22 15:09 mishooax

Sorry, slow progress here, I was changed to another project. Well some findings anyway:

  • in loss weight by latitude, latitude must be in radians, not degrees
  • it is a problem to fit in era5 1degx1deg data in gpu, just to use one step (6h) in training requires about 10GB gpu memory in Keisler article training was done upto 12 timesteps... We run out of gpu memory with 2 timesteps...

paapu88 avatar Sep 15 '22 12:09 paapu88

@paapu88 - thanks for your comments. yes, we're already converting latitude to radians before calculating the weights. fully agree wrt GPU memory usage. we're running on A100s (40GB VRAM) so we're able to squeeze in a batch size of 1 with a rollout window of 8. this is with activation checkpointing on the MP GNN layers - see https://pytorch.org/docs/stable/checkpoint.html

mishooax avatar Sep 16 '22 11:09 mishooax

Thanks @mishooax , you are lucky with resources! I try to report if we find something clever for memory usage...

paapu88 avatar Sep 16 '22 16:09 paapu88