graph_weather icon indicating copy to clipboard operation
graph_weather copied to clipboard

How to get landsea.zarr file?

Open dongZheX opened this issue 2 years ago • 8 comments

Thanks for the code.

I have tried to execute train/run.py, but the landsea.zarr is missed? How I get this file?

self.landsea = xr.open_zarr("/home/bieker/Downloads/landsea.zarr", consolidated=True).load() (If it's because I'm blind, please don't mind -.-) Thanks.

dongZheX avatar Feb 06 '23 19:02 dongZheX

Its available here: https://huggingface.co/datasets/openclimatefix/gfs-reforecast/blob/main/data/invariant/landsea.zarr.zip, it is just the ERA5 land/sea mask, so its also available from CDS and ECMWF websites too.

jacobbieker avatar Feb 08 '23 17:02 jacobbieker

Its available here: https://huggingface.co/datasets/openclimatefix/gfs-reforecast/blob/main/data/invariant/landsea.zarr.zip, it is just the ERA5 land/sea mask, so its also available from CDS and ECMWF websites too.

Thanks.

I'm a freshman in weather forecast. Thanks for the patient answer, and I am trying to reproduce Graphcast based on your code.

Now, I try to execute the train/run.py. However, I have found some issues in the code.

At present:

  • the default value of resulution for XrDataset should be "2deg", not "2.0deg", which may leads to errors about data shape.
  • I find that there are nan values in inputs, which cause that the output of model contains nan. Then, the programs raise assertion. (because the variance of "sr" = 0.0 which is used to normalize landsea data). However, I remove "sr", still crash.

[1, 1] loss: 0.144 Time: 2.1553783416748047 sec [1, 2] loss: 0.145 Time: 1.466963529586792 sec [1, 3] loss: 0.153 Time: 1.4628980159759521 sec Traceback (most recent call last): File "run.py", line 497, in loss = criterion(outputs, labels) File "/home/work/dongzhe05/anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/ssd3/dongzhe05/projects/graph_weather/graph_weather/models/losses.py", line 57, in forward assert not torch.isnan(out).any() AssertionError

Thanks again.

dongZheX avatar Feb 08 '23 18:02 dongZheX

Now, I skip the data containing "NaN", and the loss keep 0.135 in epoch 2.

[2,  1626] loss: 0.135 Time: 1.461592197418213 sec
[2,  1627] loss: 0.135 Time: 1.4107253551483154 sec
[2,  1628] loss: 0.135 Time: 1.411616325378418 sec
[2,  1629] loss: 0.135 Time: 1.4023494720458984 sec
[2,  1630] loss: 0.135 Time: 1.4118192195892334 sec
[2,  1631] loss: 0.135 Time: 1.413177728652954 sec
[2,  1632] loss: 0.135 Time: 1.4135842323303223 sec
[2,  1633] loss: 0.135 Time: 1.4481475353240967 sec
[2,  1634] loss: 0.135 Time: 1.4366750717163086 sec
[2,  1635] loss: 0.135 Time: 1.4205167293548584 sec
[2,  1636] loss: 0.135 Time: 1.4060397148132324 sec
[2,  1637] loss: 0.135 Time: 1.4105403423309326 sec
[2,  1638] loss: 0.135 Time: 1.4151151180267334 sec
[2,  1639] loss: 0.135 Time: 1.412851333618164 sec
[2,  1640] loss: 0.135 Time: 1.411670207977295 sec
[2,  1641] loss: 0.135 Time: 1.4512145519256592 sec
[2,  1642] loss: 0.135 Time: 1.4370694160461426 sec
[2,  1643] loss: 0.135 Time: 1.4110925197601318 sec
[2,  1644] loss: 0.135 Time: 1.4113273620605469 sec
[2,  1645] loss: 0.135 Time: 1.4102823734283447 sec
[2,  1647] loss: 0.135 Time: 1.4005143642425537 sec
[2,  1648] loss: 0.135 Time: 1.411473274230957 sec
[2,  1649] loss: 0.135 Time: 1.4430499076843262 sec
[2,  1651] loss: 0.134 Time: 1.414628505706787 sec
[2,  1652] loss: 0.135 Time: 1.4113256931304932 sec
[2,  1653] loss: 0.135 Time: 1.396705150604248 sec
[2,  1654] loss: 0.134 Time: 1.4125947952270508 sec
[2,  1655] loss: 0.135 Time: 1.4116289615631104 sec
[2,  1656] loss: 0.135 Time: 1.4174928665161133 sec
[2,  1657] loss: 0.135 Time: 1.4478068351745605 sec
[2,  1658] loss: 0.135 Time: 1.4566729068756104 sec
[2,  1660] loss: 0.134 Time: 1.3963241577148438 sec
[2,  1661] loss: 0.134 Time: 1.4090869426727295 sec
[2,  1662] loss: 0.134 Time: 1.4048657417297363 sec
[2,  1664] loss: 0.134 Time: 1.4057867527008057 sec

dongZheX avatar Feb 09 '23 05:02 dongZheX

Hmmm.... Yeah, I'm not sure for sure why that's the case. There could be a bug in the code, when I've tried training it its been quite slow before.

jacobbieker avatar Feb 15 '23 14:02 jacobbieker

@all-contributors please add @dongZheX for question

peterdudfield avatar Feb 17 '23 13:02 peterdudfield

@peterdudfield

I've put up a pull request to add @dongZheX! :tada:

allcontributors[bot] avatar Feb 17 '23 13:02 allcontributors[bot]

Now, I skip the data containing "NaN", and the loss keep 0.135 in epoch 2.

[2,  1626] loss: 0.135 Time: 1.461592197418213 sec
[2,  1627] loss: 0.135 Time: 1.4107253551483154 sec
[2,  1628] loss: 0.135 Time: 1.411616325378418 sec
[2,  1629] loss: 0.135 Time: 1.4023494720458984 sec
[2,  1630] loss: 0.135 Time: 1.4118192195892334 sec
[2,  1631] loss: 0.135 Time: 1.413177728652954 sec
[2,  1632] loss: 0.135 Time: 1.4135842323303223 sec
[2,  1633] loss: 0.135 Time: 1.4481475353240967 sec
[2,  1634] loss: 0.135 Time: 1.4366750717163086 sec
[2,  1635] loss: 0.135 Time: 1.4205167293548584 sec
[2,  1636] loss: 0.135 Time: 1.4060397148132324 sec
[2,  1637] loss: 0.135 Time: 1.4105403423309326 sec
[2,  1638] loss: 0.135 Time: 1.4151151180267334 sec
[2,  1639] loss: 0.135 Time: 1.412851333618164 sec
[2,  1640] loss: 0.135 Time: 1.411670207977295 sec
[2,  1641] loss: 0.135 Time: 1.4512145519256592 sec
[2,  1642] loss: 0.135 Time: 1.4370694160461426 sec
[2,  1643] loss: 0.135 Time: 1.4110925197601318 sec
[2,  1644] loss: 0.135 Time: 1.4113273620605469 sec
[2,  1645] loss: 0.135 Time: 1.4102823734283447 sec
[2,  1647] loss: 0.135 Time: 1.4005143642425537 sec
[2,  1648] loss: 0.135 Time: 1.411473274230957 sec
[2,  1649] loss: 0.135 Time: 1.4430499076843262 sec
[2,  1651] loss: 0.134 Time: 1.414628505706787 sec
[2,  1652] loss: 0.135 Time: 1.4113256931304932 sec
[2,  1653] loss: 0.135 Time: 1.396705150604248 sec
[2,  1654] loss: 0.134 Time: 1.4125947952270508 sec
[2,  1655] loss: 0.135 Time: 1.4116289615631104 sec
[2,  1656] loss: 0.135 Time: 1.4174928665161133 sec
[2,  1657] loss: 0.135 Time: 1.4478068351745605 sec
[2,  1658] loss: 0.135 Time: 1.4566729068756104 sec
[2,  1660] loss: 0.134 Time: 1.3963241577148438 sec
[2,  1661] loss: 0.134 Time: 1.4090869426727295 sec
[2,  1662] loss: 0.134 Time: 1.4048657417297363 sec
[2,  1664] loss: 0.134 Time: 1.4057867527008057 sec

Hi, dongZheX, have you addressed the issue of 'NaN' value in the downloaded data?

Esperanto-mega avatar Apr 19 '23 02:04 Esperanto-mega

Hello. The static geographic data was normalized by subtracting the mean and dividing by the standard deviation in the code. However, in const.py, LANDSEA_STD = {“sr”: 0.0,”} has a zero standard deviation for the variable sr, which causes it to approach infinity after normalization. This may result in nan values in the model prediction data. After commenting out the corresponding code, the code can run normally.

Liu990406 avatar Jun 30 '23 08:06 Liu990406