xcdat icon indicating copy to clipboard operation
xcdat copied to clipboard

[Feature]: Improve Memory Handling in Regridding routines

Open pochedls opened this issue 2 years ago • 0 comments

Is your feature request related to a problem?

When trying to regrid large datasets you occasionally run into memory issues. e.g.,

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 413. GiB for an array with shape (235403, 235403) and data type float64 OR TypeError: buffer is too small for requested array

Describe the solution you'd like

Ideally xcdat would:

  • pre-empt these issues by chunking the data or throwing a warning when it looks like trouble is ahead (based on array size)
  • or perhaps work better with dask/chunking (maybe there are options we can set or recommend) - it seems like chunking large dataset and looping over those chunks might help
  • or, lastly, provide some hints when exceptions occur

Describe alternatives you've considered

I did try to both chunk the data (e.g., ds = xcdat.open_dataset(..., chunks={'time': 100})) and to regrid just one time step (e.g., ds = ds.isel(time=[0])).

import xcdat
import numpy as np

dpath = '/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-ESM-1-1-LR/historical/r1i1p1f1/SImon/siconc/gn/v20200212/'
dpath = '/p/css03/esgf_publish/CMIP6/CMIP/MPI-M/ICON-ESM-LR/historical/r1i1p1f1/SImon/siconc/gn/v20210215/'

nlat = np.arange(-88.75, 90, 2.5)
nlon = np.arange(1.25, 360, 2.5)
ngrid = xcdat.regridder.grid.create_grid(nlat, nlon)

ds = xcdat.open_mfdataset(dpath + '*.nc')
ds = ds.isel(time=[0])
ds2 = ds.regridder.horizontal('siconc', ngrid, tool='xesmf', method='conservative_normed', periodic=True)

Additional context

The two paths I specified above might have other issues preventing regridding, but I run into memory issues before that happens.

pochedls avatar Sep 28 '22 04:09 pochedls