xESMF icon indicating copy to clipboard operation
xESMF copied to clipboard

Proposed new regridder save-load API in 0.3.0 (not backward compatible)

Open JiaweiZhuang opened this issue 4 years ago • 7 comments

I'd like to implement a new regridder save-load API to resolve #11. ESMPy 8.0.0 is the prerequisite for this, and is now available on Conda conda-forge/esmpy-feedstock#26

Before implementing this, I'd like users of xESMF to review and comment on the new API, because there will be backward-incompatible changes.

Description of the changes

The current & old API (until 0.2.x) is:

regridder = xesmf.Regridder(
    grid_in, grid_out, method, filename=filename
    )
regridder_reload = xesmf.Regridder(
    grid_in, grid_out, method, filename=filename, reuse_weights=True
    )

The first call writes out a NetCDF file containing regridding weights, and reads the file back to memory to construct a SciPy sparse matrix (representing the regridding operation). Such extra I/O exists because ESMPy < 8.0.0 can only dump weights to disk. ESMPy 8.0.0 allows in-memory retrival of weights as numpy arrays, so such disk I/O is no longer needed.

The second call, with reuse_weights=True, detects that the weight file exists and simply reads it back without re-computation.

filename is an optional kwarg. If not specified, the regridder file will be given a default name like bilinear_25x53_59x87.nc, indicating (regrid method, input grid shape, output grid shape).

The new API (starting 0.3.0) will be:

regridder = xesmf.Regridder(grid_in, grid_out, method)
regridder.save(filename)
regridder_reload = xesmf.load_regridder(filename)

The first call only constructs the regridder in-memory, and will not write any files to disk.

Instead of implicitly handing the save/load as part of xesmf.Regridder() call, the new save() and load_regridder() do so explicitly. They are similar to save()/load_model() in Keras or save/load() in PyTorch.

The original reuse_weights and filename kwargs will be removed in xesmf.Regridder().

Major question on backward-compatibility

Should the reuse_weights=True kwarg be kept for backward-compatibility? My inclination is no, as there should only be one unambiguous way to load the regridder.

However, one convenient thing about the old xesmf.Regridder(..., reuse_weights=True) API is that this single line of code works in any cases. If the weight file doesn't exist on disk, xesmf builds it and writes to disk; if exists, xesmf reads it back without re-computation.

The same logic with the new API would be:

filename = 'your-regridder-name.nc'
if os.path.exists(filename):
    regridder = xesmf.load_regridder(filename)
else:
    regridder = xesmf.Regridder(grid_in, grid_out, method)
    regridder.save(filename)

This is more verbose, but also more explicit, and explicit is better than implicit

Minor questions

  • Should it be xesmf.load_regridder() or just xesmf.load()? I prefer the former as it's more explicit.
  • Should regridder.save() has a default filename? I slightly prefer no, but some people might have a hard time choosing a name :) Can still use the current default naming scheme like bilinear_25x53_59x87.nc. Welcome other suggestions.

JiaweiZhuang avatar Nov 17 '19 02:11 JiaweiZhuang

Major question: I think it should be kept. Minor questions: 1.I prefer xesmf.load(), is there any other thing that we can load? 2.I think we have to define the name before saving... But if you want set a default filename, you have to remind the users if the filename already exists.

zhonghua-zheng avatar Nov 17 '19 02:11 zhonghua-zheng

is there any other thing that we can load?

Good point... I haven't thought of a case. One concern is that load() can be confused with the frequently used xarray.Dataset.load() which reads lazy data into memory. xesmf.load_regridder() looks more similar to the naming of xarray.load_dataset() which also reads a file from disk.

Major question: I think it should be kept.

I am thinking about catching the use_weights and filename kwargs, throwing an instructive error message, and maybe directing people to this issue. This seems good enough for not breaking the current user workflow?

JiaweiZhuang avatar Nov 17 '19 03:11 JiaweiZhuang

Should the reuse_weights=True kwarg be kept for backward-compatibility?

This new API seems like a nice step forward, and the package is young enough that you shouldn't worry too much about backwards compatibility IMO. It's still 0.X after all 😄 Thanks for xESMF! It's an excellent tool.

spencerahill avatar Nov 19 '19 00:11 spencerahill

Would it be helpful to document whether the longitudes range from -180:180 vs 0:360 in the filename like: bilinear_25x53_59x87_180.nc If I initially have a file that ranges from -180 to 180, but then have another file with the same dimensions from 0 to 360 and I reuse weights, I think the result is inaccurate.

Also, I think it'd be nice to have an output_dir kwarg under the save method so that users can keep the default format, but be able to output the save in a scratch directory.

What about xesmf.open_regridder() like xr.open_dataset?

I prefer having a default filename since naming is hard.

ahuang11 avatar Nov 22 '19 03:11 ahuang11

regridder.save would have to save not only the weights, but also the parameters passed to the original Regridder call, including the lat and lon coordinates, right ? This could create headaches if you want to maintain backward compatibility to files created with earlier versions.

Maybe a simpler option would be to have a weights argument in Regridder :

weights : None, str, Path, xarray.Dataset, scipy.sparse.coo_matrix, dict
  Regridding weights, stored as a sparse coo matrix, a dictionary with keys `row_dst`, `col_src` and 
  `weights`, an xarray.Dataset with data variables `col`, `row` and `S`, or a path to a netCDF file 
  created by ESMF. If None, compute the weights. 

An alternative to save could be to_netcdf, which would save the weights only as a file using the same naming as ESMF.

huard avatar Apr 16 '20 01:04 huard

The new API (starting 0.3.0) will be:

regridder = xesmf.Regridder(grid_in, grid_out, method)
regridder.save(filename)
regridder_reload = xesmf.load_regridder(filename)

I encountered the same MemoryError when doing loops. I have updated xESMF to 0.3.0 but have not found these methods. Does this mean the backward-compatibility has been kept? With the latest version and the same calling, I still see the memory cannot be released timely, see my following screenshot. Did I miss something?

image

FeiYao-Edinburgh avatar May 08 '20 16:05 FeiYao-Edinburgh

The API changes proposed above have not been released, nor merged into master. If you want to take a look at the branch #91 and provide feedback, that would be useful.

huard avatar May 08 '20 17:05 huard