xESMF icon indicating copy to clipboard operation
xESMF copied to clipboard

Memory leak when regridding multiple times

Open marchewaka opened this issue 9 months ago • 7 comments

I have found out an interesting thing when using a XESMF regridder in a loop. I noticed that despite closing all the datasets with ds.close() as well as using gc.collect() there is a memory leak and the memory use accumulates with every loop run. This is a massive problem for resourcing, so I investigated a little. The solution seems to be to the following: ds.close() dr_out.close() del ds ### after closing all the datasets delete the variables which will not be used in the next loop run del regridder del dr_out ### This is particularly important because it causes the largest memory leak. gc.collect() ### run garbage collection at the end of each loop run, which will free the remaining memory and restore back to the levels from before the loop.

Without deleting the variables (although I always closed the datasets), the memory accumulated throughout the whole loop.

marchewaka avatar Apr 08 '25 14:04 marchewaka

Hmm, which version of xesmf are you using ?

This looks like issue JiaweiZhuang/xESMF#53, which was partially fixed by v0.8.8. However, the fix introduced some (random ?) bug that happens when the code uses multi-processing parallelization.

Your issue made me realized that my fix to that last bug has not been released yet! I think I'll make a v0.8.9.

aulemahal avatar Apr 08 '25 15:04 aulemahal

Xesmf version that I use is 0.8.8 Thanks for looking into this!

marchewaka avatar Apr 09 '25 09:04 marchewaka

If the bug is still there with 0.8.8, then it might be a new one and not the one from the issue linked above.

Would you be able to try out your script using xESMF installed directly from the master branch of this repo ? I changed the way some low-level components are freed from the memory and it might help, but we're still waiting on a PR before making a release.

aulemahal avatar Apr 11 '25 20:04 aulemahal

Hi @marchewaka, I just released xESMF 0.8.9. Would you be able to test your code with that version ? The way ESMF objects are destroyed has been modified so that the memory is freed earlier in the process, maybe it will solve the issue ? Or at least make the "leak" smaller. I mean here that the line del regridder shouldn't be necessary anymore.

xESMF has of course no impact on memory retained by the xarray datasets ds and ds_out (well ok, I'm less sure about ds_out, but it seems unlikely).

aulemahal avatar Apr 17 '25 13:04 aulemahal

0.8.9 breaks my workflow at least with:

  File "/usr/local/lib/python3.13/dist-packages/xesmf/frontend.py", line 538, in __call__
    return self.regrid_dataarray(
           ~~~~~~~~~~~~~~~~~~~~~^
        indata,
        ^^^^^^^
    ...<3 lines>...
        output_chunks=output_chunks,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/dist-packages/xesmf/frontend.py", line 666, in regrid_dataarray
    return self._format_xroutput(dr_out, temp_horiz_dims)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/dist-packages/xesmf/frontend.py", line 1114, in _format_xroutput
    out = out.assign_coords(self.out_coords.coords)
                            ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'coords'

I have not yet dug in any further to trace it back to a cause.

Plantain avatar Apr 25 '25 11:04 Plantain

Ach, sorry @Plantain , I see the issue now. Yet another untested case, sorry about that!

I'll push a PR soon and probably another patch release.

aulemahal avatar Apr 25 '25 15:04 aulemahal

@Plantain xESMF 0.8.10 released with a fix for your issue. Sorry again!

aulemahal avatar Apr 29 '25 17:04 aulemahal