xESMF
xESMF copied to clipboard
High memory usage
The memory usage of xESMF seems quite high, higher than standalone ESMF CLI, and I wonder if this is due to a possible leak, as loading from a saved weights file uses much less RAM? Additionally, perhaps it can be improved by using lower precision (32bit vs 64bit). I also note it doesn't seem possible to destroy a Regridder object to free memory?
import xesmf
import numpy as np
import gc
@profile
def test():
src_ds = {'lat': np.arange(29.5,70.5,0.05), 'lon': np.arange(-23.5,45.0,0.05)}
dst_ds = xesmf.util.grid_2d(29, 70, 0.03, -23, 45, 0.03)
regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear')
regridder = None
gc.collect()
print("collected")
test()
python3 -m memory_profiler memtest.py
Line # Mem usage Increment Line Contents
================================================
4 65.500 MiB 65.500 MiB @profile
5 def test():
6 65.500 MiB 0.000 MiB src_ds = {'lat': np.arange(29.5,70.5,0.05), 'lon': np.arange(-23.5,45.0,0.05)}
7 160.461 MiB 94.961 MiB dst_ds = xesmf.util.grid_2d(29, 70, 0.03, -23, 45, 0.03)
8 1246.199 MiB 1085.738 MiB regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear')
9 1246.203 MiB 0.004 MiB regridder = None
10 1246.203 MiB 0.000 MiB gc.collect()
11 1246.203 MiB 0.000 MiB print("collected")
Thanks for reporting this issue. I've been wanting to diagnose the memory problem for a long time, and have just taken a closer look at this.
As background knowledge, ESMPy relies on the explicit destroy()
call to release the Fortran array memory, for almost every ESMF object. I have definitely released the memory after regridder construction (code), but there still seems to be uncleaned, module-level memory allocations. The next version of ESMPy (v8.0.0) adds a new ESMF.Manager().destroy()
call which should further clean-up the memory.
The higher-level xesmf.Regridder
API is almost just a SciPy sparse matrix, so the garbage collection would work as for normal NumPy/SciPy objects.
If you use xesmf.Regridder(src_ds, dst_ds, 'bilinear', reuse_weights=True)
, the memory usage will be much lower because it doesn't involve ESMPy calls.
More details
To demonstrate that the memory issue comes from the underlying ESMPy calls, consider this esmpy_memory.py
script:
"""A minimum script to test ESMPy memory allocation."""
import numpy as np
import ESMF
from memory_profiler import profile
def create_grid(shape):
grid = ESMF.Grid(np.array(shape),
staggerloc = ESMF.StaggerLoc.CENTER,
coord_sys = ESMF.CoordSys.SPH_DEG)
return grid
def fill_grid(grid, lons, lats):
lon_pointer = grid.get_coords(coord_dim=0,
staggerloc=ESMF.StaggerLoc.CENTER)
lat_pointer = grid.get_coords(coord_dim=1,
staggerloc=ESMF.StaggerLoc.CENTER)
lon_pointer[:] = lons
lat_pointer[:] = lats
@profile
def test_esmpy():
# define test grids
lons_in, lats_in = np.meshgrid(
np.arange(-120, 120, 0.4),
np.arange(-60, 60, 0.3)
)
lons_out, lats_out = np.meshgrid(
np.arange(-120, 120, 0.6),
np.arange(-60, 60, 0.4)
)
# build ESMPy regridder
sourcegrid = create_grid(lons_in.shape)
destgrid = create_grid(lons_out.shape)
fill_grid(sourcegrid, lons_in, lats_in)
fill_grid(destgrid, lons_out, lats_out)
sourcefield = ESMF.Field(sourcegrid)
destfield = ESMF.Field(destgrid)
regrid = ESMF.Regrid(sourcefield, destfield, filename=None,
regrid_method=ESMF.RegridMethod.BILINEAR,
unmapped_action=ESMF.UnmappedAction.IGNORE)
# release underlying Fortran memory
sourcegrid.destroy()
destgrid.destroy()
sourcefield.destroy()
destfield.destroy()
regrid.destroy()
# de-reference Python objects
sourcegrid = None
destgrid = None
sourcefield = None
destfield = None
regrid = None
lons_in = None
lats_in = None
lons_out = None
lats_out = None
if __name__ == '__main__':
test_esmpy()
python -m memory_profiler esmpy_memory.py
generates:
Filename: esmpy_memory.py
Line # Mem usage Increment Line Contents
================================================
21 59.7 MiB 59.7 MiB @profile
22 def test_esmpy():
23 # define test grids
24 59.7 MiB 0.0 MiB lons_in, lats_in = np.meshgrid(
25 59.7 MiB 0.0 MiB np.arange(-120, 120, 0.4),
26 63.6 MiB 3.8 MiB np.arange(-60, 60, 0.3)
27 )
28
29 63.6 MiB 0.0 MiB lons_out, lats_out = np.meshgrid(
30 63.6 MiB 0.0 MiB np.arange(-120, 120, 0.6),
31 65.4 MiB 1.8 MiB np.arange(-60, 60, 0.4)
32 )
33
34 # build ESMPy regridder
35 76.3 MiB 11.0 MiB sourcegrid = create_grid(lons_in.shape)
36 78.4 MiB 2.1 MiB destgrid = create_grid(lons_out.shape)
37
38 78.4 MiB 0.0 MiB fill_grid(sourcegrid, lons_in, lats_in)
39 78.4 MiB 0.0 MiB fill_grid(destgrid, lons_out, lats_out)
40
41 78.4 MiB 0.0 MiB sourcefield = ESMF.Field(sourcegrid)
42 78.4 MiB 0.0 MiB destfield = ESMF.Field(destgrid)
43
44 78.4 MiB 0.0 MiB regrid = ESMF.Regrid(sourcefield, destfield, filename=None,
45 78.4 MiB 0.0 MiB regrid_method=ESMF.RegridMethod.BILINEAR,
46 434.2 MiB 355.8 MiB unmapped_action=ESMF.UnmappedAction.IGNORE)
47
48 # release underlying Fortran memory
49 430.8 MiB 0.0 MiB sourcegrid.destroy()
50 430.8 MiB 0.0 MiB destgrid.destroy()
51 430.8 MiB 0.0 MiB sourcefield.destroy()
52 430.8 MiB 0.0 MiB destfield.destroy()
53 390.2 MiB 0.0 MiB regrid.destroy()
54
55 # de-reference Python objects
56 390.2 MiB 0.0 MiB sourcegrid = None
57 390.2 MiB 0.0 MiB destgrid = None
58 390.2 MiB 0.0 MiB sourcefield = None
59 390.2 MiB 0.0 MiB destfield = None
60 390.2 MiB 0.0 MiB regrid = None
61
62 388.3 MiB 0.0 MiB lons_in = None
63 386.5 MiB 0.0 MiB lats_in = None
64 385.6 MiB 0.0 MiB lons_out = None
65 384.7 MiB 0.0 MiB lats_out = None
The regrid.destroy()
call slightly reduced the memory usage, but not too much. This memory profiling result should be correct, as free -h
or docker stats
reports a similar memory usage.
I am going to test the new module-level ESMF.Manager().destroy()
to see if it improves things.
So it seems like ESMF.Manager().destroy()
is still not implemented in the latest version of ESMF (just checked with ESMF_8_0_0_beta_snapshot_40
built by this script). Fortunately it has a __del__()
method. For most objects, __del__()
simply calls destroy()
, for example see ESMF.Grid.
I added this extra code to the end of my original test script:
mg = ESMF.Manager()
mg.__del__()
Then, memory_profiler
gives:
69 384.5 MiB 0.0 MiB mg = ESMF.Manager()
70 201.6 MiB 0.0 MiB mg.__del__()
So __del__()
frees half of the memory, but still not all.
This top-level destroy also has serious side-effect: later attempts to build new regridders will lead to Segmentation fault
, because we have lost connection to the Fortran internal .
Still, my current suggestion is to restart the kernel and load existing weights, if memory usage becomes a problem.
I will need to check with the ESMF team on the proper use of __del__()
/destroy()
.
How do we restart the kernel with the xESMF API? Or should that not leak memory?
How do we restart the kernel with the xESMF API?
I mean restart Python kernel, and set reuse_weights=True
to load the regridder you generated previously
That doesn't seem to behave as I expected, it still seems the regridder is never free'd.
Line # Mem usage Increment Line Contents
================================================
4 65.508 MiB 65.508 MiB @profile
5 def test():
6 65.508 MiB 0.000 MiB src_ds = {'lat': np.arange(29.5,70.5,0.05,dtype=np.float32), 'lon': np.arange(-23.5,45.0,0.05,dtype=np.float32)}
7 160.383 MiB 94.875 MiB dst_ds = xesmf.util.grid_2d(np.float32(29), np.float32(70), np.float32(0.03), np.float32(-23), np.float32(45), np.float32(0.03))
8 1246.383 MiB 1086.000 MiB regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear', filename="out/weights")
9 1246.383 MiB 0.000 MiB regridder = None
10 1246.387 MiB 0.004 MiB gc.collect()
11 1270.051 MiB 23.664 MiB regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear', reuse_weights=True, filename="out/weights")
12 1222.730 MiB 0.000 MiB dst_ds = None
13 1222.730 MiB 0.000 MiB src_ds = None
14 1222.730 MiB 0.000 MiB gc.collect()
15 1222.730 MiB 0.000 MiB print("done")
@Plantain Remove the first xesmf.Regridder()
call in your test script.
@JiaweiZhuang @Plantain curious if any more work has been done on this. We just encountered this issue when trying to run repeated tasks using different regridders with reuse_weights=True
. Even if we never make calls to xesmf.Regridder
without reuse_weights=True
, our memory use builds with each call to build a new regridder from a saved file (even if we bring the previous regridder out of the namespace, e.g. by loading each regridder to the same variable name or calling del regridder
).
our memory use builds with each call to build a new regridder from a saved file
The memory use increases by how much?
With reuse_weights=True
, there is no call to ESMF.Regrid()
, so the huge 400 MB allocation won't occur. (see https://github.com/JiaweiZhuang/xESMF/issues/53#issuecomment-511157349). 0.2.0
can still have a ~10 MB memory leak due to ESMF grid objects, but it should be fixed in 0.2.1
(https://github.com/JiaweiZhuang/xESMF/commit/9963d9566ce7138c67ee6d84ee13454e36a3ebe7)
#75 should completely solve this problem. The new load_regridder()
call won't involve any call into the ESMF
module at all.
Here's an example where I load a series of regridder files and then go back to the first regridder file. And the memory use keeps expanding (for the most part). Does this seem unexpected to you?:
Line # Mem usage Increment Line Contents
================================================
4 2085.2 MiB 2085.2 MiB def test(srtm_tile_ds, ds_out_grid, regridder_files):
5 2085.2 MiB 0.0 MiB gc.collect()
6 2224.0 MiB 138.8 MiB regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][0],'bilinear', filename=str(regridder_files[0]), reuse_weights=True)
7 2224.0 MiB 0.0 MiB gc.collect()
8 2321.3 MiB 97.3 MiB regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][1],'bilinear', filename=str(regridder_files[1]), reuse_weights=True)
9 2321.3 MiB 0.0 MiB gc.collect()
10 2377.0 MiB 55.7 MiB regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][2],'bilinear', filename=str(regridder_files[2]), reuse_weights=True)
11 2377.0 MiB 0.0 MiB gc.collect()
12 2432.6 MiB 55.6 MiB regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][3],'bilinear', filename=str(regridder_files[3]), reuse_weights=True)
13 2432.6 MiB 0.0 MiB gc.collect()
14 2377.1 MiB 0.0 MiB regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][0],'bilinear', filename=str(regridder_files[0]), reuse_weights=True)
15 2377.1 MiB 0.0 MiB gc.collect()
16 2488.1 MiB 111.1 MiB regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][1],'bilinear', filename=str(regridder_files[1]), reuse_weights=True)
17 2488.1 MiB 0.0 MiB gc.collect()
18 2488.1 MiB 0.0 MiB return None
load a series of regridder files and then go back to the first regridder file.
Interesting that line 14 has no memory increment. If it is an ESMF memory leak, there should be a steady increment.
The problem might be related to uncleaned ESMF objects, or xarray.open_dataset
when reading the weight file (e.g. pydata/xarray#2186), or Python's own garbage collection with numpy/scipy objects.
Garbage collection on numpy seems a tricky issue itself, and gc.collect()
doesn't necessarily work as naively expected:
https://stackoverflow.com/questions/23977904/how-to-implement-garbage-collection-in-numpy
https://stackoverflow.com/questions/16261240/releasing-memory-of-huge-numpy-array-in-ipython
If there is still problem after #75 is implemented, then it will be an numpy/scipy/xarray issue that is out of my control.
Hi.
Is there any news regarding this issue? We are experiencing similar problems.
Our application needs to perform regriding many times, and we have tracked that each usage of the Regridder causes a massive increase in memory usage, which is not released.
Has this issue come to any resolution?
Thanks
I just became aware of this issue, and thought I would chime in from the ESMPy perspective (ESMPy is the engine behind xESMF). The ESMF 8.1.0 release, expected at the end of March '21, will include a fix for a memory leak in the search algorithm of the regridding code. This may resolve the memory issues discussed in this thread. There should be a new conda package version of ESMPy 8.1.0 by the first of April.