graph_weather
graph_weather copied to clipboard
era5 training to reproduce Keisler?
Dear Developers,
Is there any code for training the network with era5 data to reproduce Keisler paper?
There seem to be scripts in graph_weather/train but all for gfs data?
Terveisin, Markus
Not currently, I've been slowly pushing ERA5 data to Huggingface here https://huggingface.co/datasets/openclimatefix/era5 and the code for training with GFS should be fairly simple to change to use ERA5. Would welcome any contributions to add an ERA5 training script! I will probably get around to it, just not super soon probably.
- Thanks for a quick update. I'll make a era5 fork and start to work on it.
- I have downloaded era5 data 1959-2022 with a 6 hour intervall in 1deg grid to Oracle cloud. I took some extra variables compared to Keisler. I can copy that data to somewhere if that would help (29 GB/year)
- I'll not close this, because I plan to ask era5-training related questions here...
One great place would be HuggingFace Datasets, its free hosting, and has some nice features, as well as then you should be able to just take the GFS training script, change the HF dataset path, and it should roughly work. If you are hosting it somewhere else, also would be happy to rehost it on OCFs HF account and add the data loading script for it. The ERA5 data I'm pushing to that dataset right now is hourly, all variables dataset, so is very large, and the plan is to only go back to 2016 with it for now, so having a smaller one that goes back to 1959 would be fantastic!
Ok, I'll start pushing era5 6h data to HuggingFace, need to read docs first...
@all-contributors please add @paapu88 for question
@paapu88 @jacobbieker - Hi Markus and Jacob - were you able to get this to train well on the ERA5 data? We're also trying to fit the GNN to 1-degree ERA5 in an attempt to reproduce Keisler's results. Limited progress so far, the loss plateaus immediately and only marginally improves over persistence. We're debugging it now, but any tips / tricks (loss normalization, etc.) as to what may improve things?
Cheers, ~Mihai
Sorry, slow progress here, I was changed to another project. Well some findings anyway:
- in loss weight by latitude, latitude must be in radians, not degrees
- it is a problem to fit in era5 1degx1deg data in gpu, just to use one step (6h) in training requires about 10GB gpu memory in Keisler article training was done upto 12 timesteps... We run out of gpu memory with 2 timesteps...
@paapu88 - thanks for your comments. yes, we're already converting latitude to radians before calculating the weights. fully agree wrt GPU memory usage. we're running on A100s (40GB VRAM) so we're able to squeeze in a batch size of 1 with a rollout window of 8. this is with activation checkpointing on the MP GNN layers - see https://pytorch.org/docs/stable/checkpoint.html
Thanks @mishooax , you are lucky with resources! I try to report if we find something clever for memory usage...