xcdat
xcdat copied to clipboard
[Feature]: Improve Memory Handling in Regridding routines
Is your feature request related to a problem?
When trying to regrid large datasets you occasionally run into memory issues. e.g.,
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 413. GiB for an array with shape (235403, 235403) and data type float64 OR TypeError: buffer is too small for requested array
Describe the solution you'd like
Ideally xcdat
would:
- pre-empt these issues by chunking the data or throwing a warning when it looks like trouble is ahead (based on array size)
- or perhaps work better with dask/chunking (maybe there are options we can set or recommend) - it seems like chunking large dataset and looping over those chunks might help
- or, lastly, provide some hints when exceptions occur
Describe alternatives you've considered
I did try to both chunk the data (e.g., ds = xcdat.open_dataset(..., chunks={'time': 100})
) and to regrid just one time step (e.g., ds = ds.isel(time=[0])
).
import xcdat
import numpy as np
dpath = '/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-ESM-1-1-LR/historical/r1i1p1f1/SImon/siconc/gn/v20200212/'
dpath = '/p/css03/esgf_publish/CMIP6/CMIP/MPI-M/ICON-ESM-LR/historical/r1i1p1f1/SImon/siconc/gn/v20210215/'
nlat = np.arange(-88.75, 90, 2.5)
nlon = np.arange(1.25, 360, 2.5)
ngrid = xcdat.regridder.grid.create_grid(nlat, nlon)
ds = xcdat.open_mfdataset(dpath + '*.nc')
ds = ds.isel(time=[0])
ds2 = ds.regridder.horizontal('siconc', ngrid, tool='xesmf', method='conservative_normed', periodic=True)
Additional context
The two paths I specified above might have other issues preventing regridding, but I run into memory issues before that happens.