HiGHS icon indicating copy to clipboard operation
HiGHS copied to clipboard

Suspected memory leakage in highspy (python = 3.10)

Open yokawhhh opened this issue 1 year ago • 4 comments

As I solved a large mip model on more datasets on a jupyter notebook on my wsl on windows, the memory it used increased. At last, the task will fail because it used up the memory(8G). But if I run the notebook one by one and each one just solve one dataset, it can work successfully.

I guess the memory won't be completely cleared up when a model ends its solving.

If u need more correspoding information, u can tell me how to get them.

yokawhhh avatar Dec 11 '24 12:12 yokawhhh

What version of HiGHS are you using?

jajhall avatar Dec 11 '24 12:12 jajhall

highspy 1.8.1

yokawhhh avatar Dec 11 '24 12:12 yokawhhh

Would you be able to share your notebook and data? This way I can try to reproduce the behaviour locally on my windows with wsl.

galabovaa avatar Dec 11 '24 13:12 galabovaa

sorry for reply now for I being seriously ill recently. I have done another experient. In one notebook, I continuely run following function for solving nqueens for 40 times, the memory usage improves 1GB. And the memory used by the function should be freed after its calling.

def nqueens(N):
    h = highspy.Highs()
    h.silent()
    x = h.addBinaries(N, N)
    h.addConstrs(x.sum(axis=0) == 1)    # each row has exactly one queen
    h.addConstrs(x.sum(axis=1) == 1)    # each col has exactly one queen
    y = np.fliplr(x)
    h.addConstrs(x.diagonal(k).sum() <= 1 for k in range(-N + 1, N))   # each diagonal has at most one queen
    h.addConstrs(y.diagonal(k).sum() <= 1 for k in range(-N + 1, N))   # each 'reverse' diagonal has at most one queen
    h.solve()
    sol = h.vals(x)

image

yokawhhh avatar Jan 08 '25 03:01 yokawhhh

Hi @yokawhhh, @galabovaa,

I can reproduce this issue with this the following test script on Windows WSL and a physical Linux machine

test-issue-2080.py

Windows WSL (Ubuntu)

               total        used        free      shared  buff/cache   available
Mem:            31Gi       869Mi        30Gi       3.6Mi       243Mi        30Gi
Swap:          8.0Gi          0B       8.0Gi
[...]
               total        used        free      shared  buff/cache   available
Mem:            31Gi       1.6Gi        29Gi       3.6Mi       244Mi        29Gi
Swap:          8.0Gi          0B       8.0Gi

Physical Linux machine (Ubuntu)

               total        used        free      shared  buff/cache   available
Mem:            31Gi        17Gi        10Gi       116Mi       3.0Gi        13Gi
Swap:          2.0Gi       6.1Mi       2.0Gi
[...]
               total        used        free      shared  buff/cache   available
Mem:            31Gi        18Gi        10Gi       116Mi       3.0Gi        12Gi
Swap:          2.0Gi       6.1Mi       2.0Gi

(In case it's significant, note that I only see the decrease in the available column, not the free column)

I also seem to get the same behavior even if I move h = highspy.Highs() and h.silent() outside the loop and call h.clearModel() at the beginning of each pass of the loop instead.

BenChampion avatar Oct 20 '25 15:10 BenChampion

I have a new MRE for this (using nqueens300x300.mps.txt)

import highspy
import subprocess

def nqueens300x300():
    h = highspy.Highs()
    h.silent()
    h.readModel("nqueens300x300.mps")
    h.run()

for i in range(40):
    print(i)
    nqueens300x300()
    subprocess.run(["free", "-m" ])

Adds about 300 MB to memory usage by the end. Less than half of the previous example but still noticeable.

The following C++ does not seem to have this effect. (As we should hope!)

#include <cstdlib>
#include "Highs.h"

int main(){
    Highs highs;
    for (int i = 0; i < 40; ++i) {
        std::cout << i << std::endl;
        highs.setOptionValue("log_to_console", "false");
        highs.readModel("nqueens300x300.mps");
        highs.run();
        std::system("free -m");
    }
}

Any ideas @mathgeekcoder, even just for where to dig?

BenChampion avatar Nov 05 '25 18:11 BenChampion

Thanks @BenChampion for the heads up. I had a quick look and can somewhat reproduce too. I'm not seeing it with the mps file via python, but with the highspy construction of nqueens. I've not tried c++ yet. I've tested with WSL on windows.

I think there's multiple issues at play here.

  • I found a bug in highspy (my fault!), which causes a cyclic reference of the highs object that might prevent the garbage collector to clean everything up. However, fixing this doesn't make any difference to the memory leak.

  • The "leak" is much less (practically zero), if you don't actually solve the problem.

  • The "leak" is also practically zero, if you solve the LP relaxation instead of the IP

  • The "leak" occurs regardless if I use the "pythonic" wrappers, or the raw C++ bindings via python.

  • gc.collect() doesn't free everything, I also needed to call malloc_trim(0) to see the memory drop

I'm using psutil to report the memory usage for my process (so it isolates memory usage):

def memory():
    import os
    import psutil
    # Get the current process
    process = psutil.Process(os.getpid())
    # Retrieve memory usage in MB
    memory_usage_mb = process.memory_info().rss / (1024 * 1024)
    print(f"Memory Usage: {memory_usage_mb:.2f} MB")

I'll continue to debug too. It's an interesting one!

mathgeekcoder avatar Nov 06 '25 00:11 mathgeekcoder

Thanks for looking @mathgeekcoder!

With your snippet and calling memory() at the end of the body of the for loop, I see similar behavior with the .mps file as I did with calling free -m directly. The memory usage gets to 1.1GB on my WSL system on windows and only 0.6GB on a physical Linux box, both much larger than the 300MB I reported previously. (There may be other confounding variables too; I haven't ensured matching Python version etc.)

One drawback of my MRE is that I don't check the return value of readModel. If it doesn't find the .mps file it happily continues and of course doesn't manifest the increasing memory usage (and terminates quite quickly).

I think there's multiple issues at play here.

That sounds likely to me. On both test machines I did notice the memory usage plateauing rather than uniformly increasing after each pass through the loop.

BenChampion avatar Nov 06 '25 12:11 BenChampion

Ah!! You're correct @BenChampion, it wasn't finding the .mps file. My silly mistake. Fixing that helped replicate the issue with mps.

That said, I believe I can also replicate this in C++, and I think threading, garbage collection and glibc is the "cause".

The main issue is threading (gc and glibc just made it harder to see). I still need to investigate why the threading in linux is keeping hold of the memory (valgrind doesn't notice a leak for me).

Numbers below are in MB. Manually forcing python's garbage collection (gc) cleans up python stuff, while calling malloc_limit(0) cleans up available memory that glibc is holding for performance reasons.

iteration original original gc/malloc 1 thread 1 thread gc 1 thread gc/malloc
0 189 49 135 132 31
39 889 357 351 140 30
import highspy
import gc
import ctypes
import os
import psutil
malloc_trim = ctypes.CDLL("libc.so.6").malloc_trim

def memory():
    process = psutil.Process(os.getpid())
    memory_usage_mb = process.memory_info().rss / (1024 * 1024)
    print(f"{memory_usage_mb:.2f} MB")

def nqueens300x300():
    h = highspy.Highs()
    h.silent()
    h.setOptionValue("threads", 1)
    h.readModel("nqueens300x300.mps")
    h.run()
    #highspy._Highs.resetGlobalScheduler(True)  # doesn't seem to help

for i in range(40):
    print(i, end='\t')
    nqueens300x300()

    gc.collect()
    malloc_trim(0)
    memory()

mathgeekcoder avatar Nov 06 '25 23:11 mathgeekcoder

Okay, so I think I've worked out the threading issue. Though, now I'm not sure if this is the same problem in the original ticket.

@BenChampion can you try running export MALLOC_ARENA_MAX=1 before running your python script? This is not a fix, but might help determine what's going on.

When I do this, I get the following:

iteration original original gc/malloc 1 thread 1 thread gc 1 thread gc/malloc
0 164 31 159 156 30
39 453 30 456 165 30

That is, memory doesn't increase even with multiple threads (after we clean garbage collect and release glibc cache).

mathgeekcoder avatar Nov 07 '25 02:11 mathgeekcoder

My silly mistake. Fixing that helped replicate the issue with mps

My bad for laziness around error handling!

After export MALLOC_ARENA_MAX=1, adding an "original gc" column, and running on a physical Linux machine (not WSL)

iteration original original gc original gc/malloc 1 thread 1 thread gc 1 thread gc/malloc
0 132 132 27 117 117 26
39 454 160 28 433 112 26

(That is, similar results.)

Did you manage to reproduce this in C++?

And just to make sure, is the following summary/interpretation of our findings so far correct?

  • Although it seems to make no difference in our tests, there's a cyclic dependency highspy is creating that might stop Python from freeing memory in some cases. (For my own interest, could you point me to the relevant line(s)?)
  • Otherwise, it looks like most of the symptoms in our tests are coming from Python and glibc management of memory that is "in theory" available.

BenChampion avatar Nov 07 '25 10:11 BenChampion

(That is, similar results.)

Thanks for confirming @BenChampion!

Did you manage to reproduce this in C++?

Yes.

  • Although it seems to make no difference in our tests, there's a cyclic dependency highspy is creating that might stop Python from freeing memory in some cases. (For my own interest, could you point me to the relevant line(s)?)

Yes: HighsCallback.highs. Instead of pointing directly to the relevant highs object, it probably should use weakref.ref(highs). There's also HighspyArray.highs, though only the callback has the cyclic dependency. That said, this cyclic dependency issue could be avoided if the user calls clearCallbacks etc. once they're done - but that's not particularly nice.

  • Otherwise, it looks like most of the symptoms in our tests are coming from Python and glibc management of memory that is "in theory" available.

That's my understanding too. It's not really a bug, but it's using more memory than people might expect.

This might also be the original issue, but I'd imagine the python garbage collector and glibc clean-up kicking in before you run out of memory. It's not a memory leak. It's a side-effect of our threading model and glibc allocator.

There's a few things we could do if we wanted to avoid this behaviour. We could limit the glibc arena programmatically, perhaps we could reconsider how we're doing work-stealing etc. on our multiple threads, or we could use a different malloc allocator, there's potential performance benefits to that too (see #2476).

FYI: I've tried a few of them via my C++ test, i.e., injecting the different allocators via:

LD_PRELOAD=lib*malloc.so ./highs
iterations glibc glibc ARENA=1 mimalloc jemalloc tcmalloc
0 163 MB 138 MB 209 MB 142 MB 218 MB
39 626 MB 148 MB 254 MB 203 MB 241 MB

One challenge is that the arena concept helps speed up allocations across multiple threads (so we probably want more than one), but we also have work stealing that might fragment the memory allocations across the different arena heaps. The other allocators have similar concepts, and we could possibly tune whatever one we wanted to fit our needs best.

mathgeekcoder avatar Nov 07 '25 21:11 mathgeekcoder

Great to see @BenChampion and @mathgeekcoder sparking off each other to investigate this! 🤩

jajhall avatar Nov 08 '25 00:11 jajhall

Yes: HighsCallback.highs. Instead of pointing directly to the relevant highs object, it probably should use weakref.ref(highs). There's also HighspyArray.highs, though only the callback has the cyclic dependency. That said, this cyclic dependency issue could be avoided if the user calls clearCallbacks etc. once they're done - but that's not particularly nice.

I can create a new issue for this. (I can also try making the required changes.)

This might also be the original issue, but I'd imagine the python garbage collector and glibc clean-up kicking in before you run out of memory.

I would have thought that too, but it does seem like glibc isn't always that smart and can still exhaust memory.

An example of this happening in practice in another project (avoiding the backlink, I hope, by inserting www.!)

In any case, I propose we close this issue for now since there doesn't appear to be an actual memory leak and the above investigations provide several potential workarounds for affected users.

BenChampion avatar Nov 10 '25 11:11 BenChampion

I can create a new issue for this. (I can also try making the required changes.)

Great! Should be fairly straightforward. I'm happy to review the PR.

I would have thought that too, but it does seem like glibc isn't always that smart and can still exhaust memory.

In any case, I propose we close this issue for now since there doesn't appear to be an actual memory leak and the above investigations provide several potential workarounds for affected users.

Yeah, I agree. That said, I think it'll be worth revisiting memory allocation and threading in the future for better performance and reduced memory overhead/fragmentation. Sounds rather fascinating, so I'll add that to my list of investigation TODOs :)

mathgeekcoder avatar Nov 10 '25 14:11 mathgeekcoder