xESMF icon indicating copy to clipboard operation
xESMF copied to clipboard

problem with parallel=True, no mask in input grid but error related to mask.

Open axelschweiger opened this issue 1 year ago • 5 comments

Experimenting with parallel=True for regridding some larger datasets. I found the documentation that the output grid has to have a data variable and followed the instructions to just make one. But I'm encountering a strange problem with the input grid. My input grid only contains lat and lon, each 1d arrays. I have tried to actually specify a mask but that gives the same result. I'm using bilinear gridding so no bounds should be necessary. It works when parallel=False albeit very slowly. Thanks for any help on this.

image

I am getting an error that the mask.shape isn't the same as the lon.shape even though I don' have a mask and the input lon.shape is only 1dimensional

image

axelschweiger avatar Nov 25 '24 03:11 axelschweiger

Correction (messed up). If I don't specify a mask on the input I get an error about not having lat, lon values or being CF complient (see below). If I do, I get the above error about the wrong shape. I tried to transpose the mask but that doesn't work either.

image

axelschweiger avatar Nov 25 '24 03:11 axelschweiger

Hum, these errors seem to come before anything in the "parallel regridder" code is touched. I see you have no attributes in lat and lon. Even though, it should work anyway with those names, I would suggest adding units='degrees_north' and units='degrees_east'.

Also, can you send a printout of your dataset with the mask added ?

Finally, as said in the other PR, I don't think xESMF's parallel option is actually helping you here. As both grids fit in your RAM and as your source grid is much bigger than the destination, it won't be faster than doing it with parallel=False. It might even be slower. Best performance, should be with cli ESMF with MPI.

aulemahal avatar Nov 25 '24 15:11 aulemahal

Thanks for the reply. I tried with setting the units attributes, no difference. See below for input grid details. I guess the "parallel" option isn't going to get me anywhere faster. I had tried ESMF_RegridWeightGen script as described here: https://github.com/pangeo-data/xESMF/issues/405 but get error messages that overflow my disk (I guess there is no way to turn of the logging? Was wondering if there is an environment variable but can't find anything).

20241121 134756.230 ERROR PET0 ESMCI_DistGrid.C:5101 ESMCI::DistGrid::getSequenceInde Invalid argument - SeqIndex type mismatch detected

image

image

Here is the code snippet that creates the error:

sc,vc,zc = imUtils.getAncilData(gridName="864X640") # reads output grid coordinates
stype='vector'

# 
imask = da.ones((17280,2880),
               dtype=bool, chunks=(100, 100))
grid_in= xr.Dataset(
    data_vars=dict(
        mask=(["lon","lat"], imask)),
    coords=dict(
        lon=(["lon"], ib.lon.data),
        lat=(["lat"], ib.lat.data),
    )
)

grid_in['lat'].attrs['units']='degrees_north'
grid_in['lon'].attrs['units']='degrees_east'

grid_out={'lon':vc.lon.chunk({"y_sn":100,"x_ew":100}),
          'lat':vc.lat.chunk({"y_sn":100,"x_ew":100})}
grid_out=xr.Dataset(grid_out)
#grid_in['lat'].attrs['units']='degrees_north'
#grid_in['lon'].attrs['units']='degrees_east']

reuse_flag=False

omask = da.ones((grid_out.dims['y_sn'],
                 grid_out.dims['x_ew']),
                 dtype=bool, chunks=(100, 100))

grid_out["mask"] = (grid_out.dims, omask)

regridder = xe.Regridder(grid_in, grid_out, 'bilinear',reuse_weights=reuse_flag, 
                         filename='gebco_weights864X640_5x5.'+stype+'.bl.nc',
                         unmapped_to_nan=True, parallel=True)



axelschweiger avatar Nov 25 '24 23:11 axelschweiger

I know it sounds stupid, but just for testing, could you try transposing the mask ? :

grid_in= xr.Dataset(
    data_vars=dict(
        mask=(["lat","lon"], imask.T)),
    coords=dict(
        lon=(["lon"], ib.lon.data),
        lat=(["lat"], ib.lat.data),
    )
)

I think there's some hardcoding of the dimension order going on, which shouldn't be the case with xarray-based suff like this...

aulemahal avatar Nov 25 '24 23:11 aulemahal

I had tried this before. I also tried to reverse the order of lat lon in xarray which just changes the error to lat being the offending variable. See error below.


ValueError Traceback (most recent call last) Cell In[26], line 9 4 #grid_in={'lon':ib.lon.data,'lat':ib.lat.data} 5 # 'lon_b':ib.lon_b,'lat_b':ib.lat_b} 6 #grid_in=xr.Dataset(grid_in) 7 imask = da.ones((17280,2880), 8 dtype=bool, chunks=(100, 100)) ----> 9 grid_in= xr.Dataset( 10 data_vars=dict( 11 mask=(["lon","lat"], imask.T)), 12 coords=dict( 13 lon=(["lon"], ib.lon.data), 14 lat=(["lat"], ib.lat.data), 15 ) 16 ) 18 grid_in['lat'].attrs['units']='degrees_north' 19 grid_in['lon'].attrs['units']='degrees_east'

File ~/anaconda3/envs/pangeo310/lib/python3.10/site-packages/xarray/core/dataset.py:605, in Dataset.init(self, data_vars, coords, attrs) 602 if isinstance(coords, Dataset): 603 coords = coords.variables --> 605 variables, coord_names, dims, indexes, _ = merge_data_and_coords( 606 data_vars, coords, compat="broadcast_equals" 607 ) 609 self._attrs = dict(attrs) if attrs is not None else None 610 self._close = None

File ~/anaconda3/envs/pangeo310/lib/python3.10/site-packages/xarray/core/merge.py:575, in merge_data_and_coords(data_vars, coords, compat, join) 573 objects = [data_vars, coords] 574 explicit_coords = coords.keys() --> 575 return merge_core( 576 objects, 577 compat, 578 join, 579 explicit_coords=explicit_coords, 580 indexes=Indexes(indexes, coords), 581 )

File ~/anaconda3/envs/pangeo310/lib/python3.10/site-packages/xarray/core/merge.py:761, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value) 756 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat) 757 variables, out_indexes = merge_collected( 758 collected, prioritized, compat=compat, combine_attrs=combine_attrs 759 ) --> 761 dims = calculate_dimensions(variables) 763 coord_names, noncoord_names = determine_coords(coerced) 764 if explicit_coords is not None:

File ~/anaconda3/envs/pangeo310/lib/python3.10/site-packages/xarray/core/variable.py:3208, in calculate_dimensions(variables) 3206 last_used[dim] = k 3207 elif dims[dim] != size: -> 3208 raise ValueError( 3209 f"conflicting sizes for dimension {dim!r}: " 3210 f"length {size} on {k!r} and length {dims[dim]} on {last_used!r}" 3211 ) 3212 return dims

ValueError: conflicting sizes for dimension 'lon': length 17280 on 'lon' and length 2880 on {'lon': 'mask', 'lat': 'mask'}

axelschweiger avatar Nov 26 '24 00:11 axelschweiger