cosima-cookbook icon indicating copy to clipboard operation
cosima-cookbook copied to clipboard

Loading CICE data is very expensive

Open MartinDix opened this issue 2 years ago • 4 comments

Loading a CICE variable takes much more time and memory than a MOM variable. E.g.

import cosima_cookbook as cc
session = cc.database.create_session()
expt = '025deg_jra55_ryf9091_gadi'
aice = cc.querying.getvar(expt, 'aice_m', session, n=120)

takes 90 s and several GB of memory (from notebook on OOD) compared to

sea_level = cc.querying.getvar(expt, 'sea_level', session, n=120)

which takes ~15s. Trying to load the full run for a CICE variable takes a crazy amount of memory.

I think the issue is that the CICE variables have

                aice_m:coordinates = "TLON TLAT time" ;

where TLON and TLAT are 2D variables included in the CICE files. MOM variables have

                sea_level:coordinates = "geolon_t geolat_t" ;

where geolon_t and geolat_t are not in the files.

I think this means that xarray.open_mfdataset is reading TLON and TLAT for each file to check if it has to concatenate on those coordinates.

I couldn't see a way of persuading xarray that it should only try to concatenate on the time dimension.

MartinDix avatar Jun 08 '22 05:06 MartinDix

Hi Martin. I'm not sure of your specific case, but when loading datasets using xr.open_mfdataset I typically use something like:

OISST = xr.open_mfdataset('/g/data/ua8/NOAA_OISST/AVHRR/v2-1_modified/*_' + str(year) + '.nc',concat_dim="time", combine="nested", data_vars='minimal', coords='minimal', compat='override',parallel=True)

This makes some extra assumptions about concat variables etc. and makes the loading much quicker. It's described in more detail in the "Note" at https://xarray.pydata.org/en/stable/user-guide/io.html#reading-multi-file-datasets

I would have to differ to @angus-g or @aidanheerdegen as to whether these options are/should be implemented in the cookbook.

rmholmes avatar Jun 08 '22 06:06 rmholmes

decode_coords = False speeds it up a lot, as in this IcePlottingExample.

adele-morrison avatar Jun 08 '22 06:06 adele-morrison

Thanks Adele, decode_coords is what I'd been looking for.

MartinDix avatar Jun 09 '22 23:06 MartinDix

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/issues-loading-access-om2-01-data-from-cycle-4/418/3

access-hive-bot avatar Feb 15 '23 02:02 access-hive-bot