cosima-cookbook
cosima-cookbook copied to clipboard
Loading CICE data is very expensive
Loading a CICE variable takes much more time and memory than a MOM variable. E.g.
import cosima_cookbook as cc
session = cc.database.create_session()
expt = '025deg_jra55_ryf9091_gadi'
aice = cc.querying.getvar(expt, 'aice_m', session, n=120)
takes 90 s and several GB of memory (from notebook on OOD) compared to
sea_level = cc.querying.getvar(expt, 'sea_level', session, n=120)
which takes ~15s. Trying to load the full run for a CICE variable takes a crazy amount of memory.
I think the issue is that the CICE variables have
aice_m:coordinates = "TLON TLAT time" ;
where TLON
and TLAT
are 2D variables included in the CICE files. MOM variables have
sea_level:coordinates = "geolon_t geolat_t" ;
where geolon_t
and geolat_t
are not in the files.
I think this means that xarray.open_mfdataset
is reading TLON
and TLAT
for each file to check if it has to concatenate on those coordinates.
I couldn't see a way of persuading xarray that it should only try to concatenate on the time dimension.
Hi Martin. I'm not sure of your specific case, but when loading datasets using xr.open_mfdataset
I typically use something like:
OISST = xr.open_mfdataset('/g/data/ua8/NOAA_OISST/AVHRR/v2-1_modified/*_' + str(year) + '.nc',concat_dim="time", combine="nested", data_vars='minimal', coords='minimal', compat='override',parallel=True)
This makes some extra assumptions about concat variables etc. and makes the loading much quicker. It's described in more detail in the "Note" at https://xarray.pydata.org/en/stable/user-guide/io.html#reading-multi-file-datasets
I would have to differ to @angus-g or @aidanheerdegen as to whether these options are/should be implemented in the cookbook.
decode_coords = False
speeds it up a lot, as in this IcePlottingExample.
Thanks Adele, decode_coords is what I'd been looking for.
This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:
https://forum.access-hive.org.au/t/issues-loading-access-om2-01-data-from-cycle-4/418/3