MemoryError when opening larger 0..360 datasets
When opening larger datasets that are defined on 0..360 longitude grid, such as esa_msla_ext given by Prosper, a MemoryError is encountered on some machines (mine, 8GB).
This is due to these lines:
https://github.com/CCI-Tools/cate/blob/a1e31c4673399a99d58786d1208c2ba13de138f2/cate/core/opimpl.py#L185-L193
Calling var.values tries to load the entire variable in memory as an np.ndarray. The more memory safe way to approach this would be using a recursive groupby of the variable until only lat,lon or even just lon remains and then do the value swapping. In a similar way how it is done in coregistration:
https://github.com/CCI-Tools/cate/blob/a1e31c4673399a99d58786d1208c2ba13de138f2/cate/ops/coregistration.py#L258
EDIT: Even when 'groupbying' down to lat/lon, don't call .values on the data variable, as this will result in the new dataset being slowly converted to an xarray dataset consisting of many in-memory numpy datasets. Instead, do the conversion using some tricky indexing magic.
Assigning @forman to figure out how to approach this and to make sure it doesn't disappear in noise.
We came across this issue recently when trying to open CCI Land Cover with dim size (lon=120000, lat=60000) prepared for the Copernicus Climate Data Store. Their convention is 0 <= lon < 360! OMG.
@JanisGailis happy to discuss a solution with you early next week.
Sure, I'm quite sure what the problem is.