OceanLab icon indicating copy to clipboard operation
OceanLab copied to clipboard

memory usage in scaloa

Open dksasaki opened this issue 4 years ago • 5 comments

Hey guys,

I've used your objective mapping function scaloa and noticed there is a simple way to reduce memory usage.

The variables d2 and dc2 can occupy a huge memory space, so deleting them after defining both correlation and cross correlation matrices (A,C, respectively) and before inverting the matrix is useful. In one of my cases, it it frees up a few gbs of memory (of course, this depends on both grid and data).

(...)
    d2 = ((np.tile(x, (n, 1)).T - np.tile(x, (n, 1))) ** 2 +
    (np.tile(y, (n, 1)).T - np.tile(y, (n, 1))) ** 2)
    nv = len(xc)
    xc, yc = np.reshape(xc, (1, nv)), np.reshape(yc, (1, nv))
    # Squared distance between the observations and the grid points.
    dc2 = ((np.tile(xc, (n, 1)).T - np.tile(x, (nv, 1))) ** 2 +
    (np.tile(yc, (n, 1)).T - np.tile(y, (nv, 1))) ** 2)
    # Correlation matrix between stations (A) and cross correlation (stations
    # and grid points (C))
    A = (1 - err) * np.exp(-d2 / corrlen ** 2)
    C = (1 - err) * np.exp(-dc2 / corrlen ** 2)
    if 0: # NOTE: If the parameter zc is used (`scaloa2.m`)
        A = (1 - d2 / zc ** 2) * np.exp(-d2 / corrlen ** 2)
        C = (1 - dc2 / zc ** 2) * np.exp(-dc2 / corrlen ** 2)
        
    # here!!!!!!!!!   <----------
    del(d2, dc2)
        
(...)

dksasaki avatar Aug 11 '21 14:08 dksasaki

Hi @dksasaki,

Thanks for raising this issue.

Do you mean that there is a memory leakage after running the function or just cleaning these variables before running the rest of the interpolation to reduce the peak of memory usage?

We could check how other packages usually deal with this problem and see if del is the best solution.

Me and @dantecn were also thinking about creating an option for breaking grid points into blocks to reduce performance while keeping low memory usage.

We can also just simply add an example for that on documentation.

iuryt avatar Aug 17 '21 02:08 iuryt

Hi @iuryt,

There is no memory leakage. When the method runs, these extra matrices can contribute significantly to the memory usage making the peak in memory even worse. The del thing was just a quick-fix I added, but considering the simplicity if this solution I wonder what problems could arise from this choice.

Breaking the grid into chunks is a good idea, although the whole processing gets slower due to multiple matrix inversions. Let me know if you plan to implement it, I have written a few lines that may help.

dksasaki avatar Aug 17 '21 13:08 dksasaki

If you want to implement breaking into blocks, go ahead. You can add an argument like nblocks=None to scaloa and vectoa. Despite loosing performance, I believe this is a nice way to bypass memory overload. You may also add verbose=False that can activate some progress bar for lazy interpolation.

Can you check how other packages as xarray deal with cleaning memory? I believe @Ryukamusa may be the best person in the group to check that as well.

Once you make some of the modifications on your forked repo, you can make a pull request and relate to this issue.

Please, let me know if you have any questions, we just started the group and we are also still learning how to manage the development process here.

iuryt avatar Aug 17 '21 15:08 iuryt

It turns out that I came back here for some reason. I just think that we could make this package better by making it work with xarray, that way makes easy to paralelize or run it in slow mode using dask when needed.

iuryt avatar Jul 18 '23 00:07 iuryt

Sorry for not replying, I also forgot about this issue. I developed a way to make this piece faster without using as much memory. I basically only consider data from a given point within a certain distance range. Not sure how to use dask and xarray with it though, but we can give it a try.

dksasaki avatar Jul 18 '23 13:07 dksasaki