cate icon indicating copy to clipboard operation
cate copied to clipboard

MemoryError when opening larger 0..360 datasets

Open JanisGailis opened this issue 7 years ago • 4 comments

When opening larger datasets that are defined on 0..360 longitude grid, such as esa_msla_ext given by Prosper, a MemoryError is encountered on some machines (mine, 8GB).

This is due to these lines:

https://github.com/CCI-Tools/cate/blob/a1e31c4673399a99d58786d1208c2ba13de138f2/cate/core/opimpl.py#L185-L193

Calling var.values tries to load the entire variable in memory as an np.ndarray. The more memory safe way to approach this would be using a recursive groupby of the variable until only lat,lon or even just lon remains and then do the value swapping. In a similar way how it is done in coregistration:

https://github.com/CCI-Tools/cate/blob/a1e31c4673399a99d58786d1208c2ba13de138f2/cate/ops/coregistration.py#L258

EDIT: Even when 'groupbying' down to lat/lon, don't call .values on the data variable, as this will result in the new dataset being slowly converted to an xarray dataset consisting of many in-memory numpy datasets. Instead, do the conversion using some tricky indexing magic.

JanisGailis avatar Jul 14 '18 21:07 JanisGailis

Assigning @forman to figure out how to approach this and to make sure it doesn't disappear in noise.

JanisGailis avatar Jul 14 '18 21:07 JanisGailis

We came across this issue recently when trying to open CCI Land Cover with dim size (lon=120000, lat=60000) prepared for the Copernicus Climate Data Store. Their convention is 0 <= lon < 360! OMG.

forman avatar Sep 07 '18 09:09 forman

@JanisGailis happy to discuss a solution with you early next week.

forman avatar Sep 07 '18 09:09 forman

Sure, I'm quite sure what the problem is.

JanisGailis avatar Sep 07 '18 09:09 JanisGailis