FourCastNet icon indicating copy to clipboard operation
FourCastNet copied to clipboard

Channel Directory

Open luiservela opened this issue 1 year ago • 4 comments

Hi guys, I am trying to make sense of the list of channels [0..19]. I would like to know which channel is first, seconds, etc.

By looking at this file https://github.com/NVlabs/FourCastNet/blob/master/data_process/parallel_copy_small_set.py I got the following channel list:

0: 'u10' 1: 'v10' 2: 't2m' 3: 'sp' 4: 'msl' 5: 't_850' 6: 'u_1000' 7: 'v_1000' 8: 'z_1000' 9: 'u_850' 10: 'v_850' 11: 'z_850' 12: 'u_500' 13: 'v_500' 14: 'z_500' 15: 't_500' 16: 'z_50' 17: 'r_500' 18: 'r_850' 19: 'tcwv'

Is this the correct mapping? Where can I find a definitive one?

Best,

Luis V

luiservela avatar Oct 30 '24 07:10 luiservela

Also, while looking at the stale branch "fature-gfs-initialization", especifically to the "gfsinit/grb2hdf5.py" file (https://github.com/NVlabs/FourCastNet/blob/feature-gfs-initialization/gfsinit/grb2hdf5.py) I see an identical mapping (the stale gfs-file has 25 channels, but the first 20 seem to follow the same mapping as the channels on the master branch).

luiservela avatar Oct 30 '24 07:10 luiservela

Hello, I have been looking at the .h5 file I created using your parallel_copy_small_set.py code and comparing it against the ERA5 files, out_of_sample/2018.h5 in particular, that you provide via Globus download.

  • I first compared the shaped of the two datasets and noticed they differ: the ERA5 has 21 channels, while the one created via your script has only 20. My theory is the 21st field corresponds to the SST which is currently commented out in the parallel_copy_small_set.py script.
  • Then I compared the mean values (across lat/lon) of each of the other 20 channels and I see some mayor discrepancies.
  • Last, and most importantly I perform a basic hypothesis test:"If the channels means and standard deviations are given by the global_means.npy and global_stds.npy files, then the mean values observed both on my own .h5 file and on the given era5 .h5 file are highly probable (say p>5%)". The computation of these probabilities highlighted the fact that the channels of the era5 .h5 file are all "likely" to occur, while for some (z_50, z_500, z_850 and z_1000) channels of the file created using your script are below 1e-10. This indicates that it is highly improbable these 4 channels are correctly treated by the parallel_copy_small_set.py script.

Have there been any significant changes to this script since the time the out_of_sample/2018.h5 files were created? perhaps a re-arrangement of the channels?

Best, Luis

luiservela avatar Nov 12 '24 16:11 luiservela

Update: I believe i have resolve the issue.

The problem I believe its the ordering of the p-levels in the downloaded .nc files from the CopernicusDataStore (The ones one obtain using the file copernicus/get_data_pl_short_length.py) seem to be in the inverse order.

That is, the current repo will work just fine (that is, download new data to be used ass Initial Conditions) if the pressure levels would be stored in ascending [50, 500, 850, 1000] order (like in the ERA5 files that one can obtain via Globus downloads).

However, a request to the CDS creates a file where the p-levels have a descending order [1000, 850, 500, 50] (liuke any file that is today downloaded from the CDS) and therefore the model generates garbage-looking outputs without failing.

To correct for this issue, one must "order" properly the fields in the parallel_copy_small_set.py file.

Note: The CDS went through a major update on the 26th September 2024 and this ordering might have been one such change - though I can be completely certain about this (yet).

Best, Luis V

luiservela avatar Nov 13 '24 12:11 luiservela

Hi @luiservela, thanks for your advice. I applied your suggestion in the parallel_copy_small_set.py file, and the metric results improved when working with the new H5 file. Thank you for sharing this update here. I had been struggling with this issue for quite some time, and your advice helped me resolve it. Additionally, I encountered a few other issues while running inference with FourCastNet, and I would like to share them with you if you're open to it. Would it be possible for you to provide a preferred contact method where we could discuss this further?

Kardelennkayaa avatar Mar 07 '25 09:03 Kardelennkayaa