HiGHS Suspected memory leakage in highspy (python = 3.10)

As I solved a large mip model on more datasets on a jupyter notebook on my wsl on windows, the memory it used increased. At last, the task will fail because it used up the memory(8G). But if I run the notebook one by one and each one just solve one dataset, it can work successfully.

I guess the memory won't be completely cleared up when a model ends its solving.

If u need more correspoding information, u can tell me how to get them.

Dec 11 '24 12:12 yokawhhh

What version of HiGHS are you using?

Dec 11 '24 12:12 jajhall

highspy 1.8.1

Dec 11 '24 12:12 yokawhhh

Would you be able to share your notebook and data? This way I can try to reproduce the behaviour locally on my windows with wsl.

Dec 11 '24 13:12 galabovaa

sorry for reply now for I being seriously ill recently. I have done another experient. In one notebook, I continuely run following function for solving nqueens for 40 times, the memory usage improves 1GB. And the memory used by the function should be freed after its calling.

def nqueens(N):
    h = highspy.Highs()
    h.silent()
    x = h.addBinaries(N, N)
    h.addConstrs(x.sum(axis=0) == 1)    # each row has exactly one queen
    h.addConstrs(x.sum(axis=1) == 1)    # each col has exactly one queen
    y = np.fliplr(x)
    h.addConstrs(x.diagonal(k).sum() <= 1 for k in range(-N + 1, N))   # each diagonal has at most one queen
    h.addConstrs(y.diagonal(k).sum() <= 1 for k in range(-N + 1, N))   # each 'reverse' diagonal has at most one queen
    h.solve()
    sol = h.vals(x)

Jan 08 '25 03:01 yokawhhh

Hi @yokawhhh, @galabovaa,

I can reproduce this issue with this the following test script on Windows WSL and a physical Linux machine

test-issue-2080.py

Windows WSL (Ubuntu)

               total        used        free      shared  buff/cache   available
Mem:            31Gi       869Mi        30Gi       3.6Mi       243Mi        30Gi
Swap:          8.0Gi          0B       8.0Gi
[...]
               total        used        free      shared  buff/cache   available
Mem:            31Gi       1.6Gi        29Gi       3.6Mi       244Mi        29Gi
Swap:          8.0Gi          0B       8.0Gi

Physical Linux machine (Ubuntu)

               total        used        free      shared  buff/cache   available
Mem:            31Gi        17Gi        10Gi       116Mi       3.0Gi        13Gi
Swap:          2.0Gi       6.1Mi       2.0Gi
[...]
               total        used        free      shared  buff/cache   available
Mem:            31Gi        18Gi        10Gi       116Mi       3.0Gi        12Gi
Swap:          2.0Gi       6.1Mi       2.0Gi

(In case it's significant, note that I only see the decrease in the available column, not the free column)

I also seem to get the same behavior even if I move h = highspy.Highs() and h.silent() outside the loop and call h.clearModel() at the beginning of each pass of the loop instead.

Oct 20 '25 15:10 BenChampion

I have a new MRE for this (using nqueens300x300.mps.txt)

import highspy
import subprocess

def nqueens300x300():
    h = highspy.Highs()
    h.silent()
    h.readModel("nqueens300x300.mps")
    h.run()

for i in range(40):
    print(i)
    nqueens300x300()
    subprocess.run(["free", "-m" ])

Adds about 300 MB to memory usage by the end. Less than half of the previous example but still noticeable.

The following C++ does not seem to have this effect. (As we should hope!)

#include <cstdlib>
#include "Highs.h"

int main(){
    Highs highs;
    for (int i = 0; i < 40; ++i) {
        std::cout << i << std::endl;
        highs.setOptionValue("log_to_console", "false");
        highs.readModel("nqueens300x300.mps");
        highs.run();
        std::system("free -m");
    }
}

Any ideas @mathgeekcoder, even just for where to dig?

Nov 05 '25 18:11 BenChampion

Thanks @BenChampion for the heads up. I had a quick look and can somewhat reproduce too. I'm not seeing it with the mps file via python, but with the highspy construction of nqueens. I've not tried c++ yet. I've tested with WSL on windows.

I think there's multiple issues at play here.

I found a bug in highspy (my fault!), which causes a cyclic reference of the highs object that might prevent the garbage collector to clean everything up. However, fixing this doesn't make any difference to the memory leak.
The "leak" is much less (practically zero), if you don't actually solve the problem.
The "leak" is also practically zero, if you solve the LP relaxation instead of the IP
The "leak" occurs regardless if I use the "pythonic" wrappers, or the raw C++ bindings via python.
gc.collect() doesn't free everything, I also needed to call malloc_trim(0) to see the memory drop

I'm using psutil to report the memory usage for my process (so it isolates memory usage):

def memory():
    import os
    import psutil
    # Get the current process
    process = psutil.Process(os.getpid())
    # Retrieve memory usage in MB
    memory_usage_mb = process.memory_info().rss / (1024 * 1024)
    print(f"Memory Usage: {memory_usage_mb:.2f} MB")

I'll continue to debug too. It's an interesting one!

Nov 06 '25 00:11 mathgeekcoder

Thanks for looking @mathgeekcoder!

With your snippet and calling memory() at the end of the body of the for loop, I see similar behavior with the .mps file as I did with calling free -m directly. The memory usage gets to 1.1GB on my WSL system on windows and only 0.6GB on a physical Linux box, both much larger than the 300MB I reported previously. (There may be other confounding variables too; I haven't ensured matching Python version etc.)

One drawback of my MRE is that I don't check the return value of readModel. If it doesn't find the .mps file it happily continues and of course doesn't manifest the increasing memory usage (and terminates quite quickly).

I think there's multiple issues at play here.

That sounds likely to me. On both test machines I did notice the memory usage plateauing rather than uniformly increasing after each pass through the loop.

Nov 06 '25 12:11 BenChampion

Ah!! You're correct @BenChampion, it wasn't finding the .mps file. My silly mistake. Fixing that helped replicate the issue with mps.

That said, I believe I can also replicate this in C++, and I think threading, garbage collection and glibc is the "cause".

The main issue is threading (gc and glibc just made it harder to see). I still need to investigate why the threading in linux is keeping hold of the memory (valgrind doesn't notice a leak for me).

Numbers below are in MB. Manually forcing python's garbage collection (gc) cleans up python stuff, while calling malloc_limit(0) cleans up available memory that glibc is holding for performance reasons.

iteration	original	original gc/malloc	1 thread	1 thread gc	1 thread gc/malloc
0	189	49	135	132	31
39	889	357	351	140	30

import highspy
import gc
import ctypes
import os
import psutil
malloc_trim = ctypes.CDLL("libc.so.6").malloc_trim

def memory():
    process = psutil.Process(os.getpid())
    memory_usage_mb = process.memory_info().rss / (1024 * 1024)
    print(f"{memory_usage_mb:.2f} MB")

def nqueens300x300():
    h = highspy.Highs()
    h.silent()
    h.setOptionValue("threads", 1)
    h.readModel("nqueens300x300.mps")
    h.run()
    #highspy._Highs.resetGlobalScheduler(True)  # doesn't seem to help

for i in range(40):
    print(i, end='\t')
    nqueens300x300()

    gc.collect()
    malloc_trim(0)
    memory()

Nov 06 '25 23:11 mathgeekcoder

Okay, so I think I've worked out the threading issue. Though, now I'm not sure if this is the same problem in the original ticket.

@BenChampion can you try running export MALLOC_ARENA_MAX=1 before running your python script? This is not a fix, but might help determine what's going on.

When I do this, I get the following:

iteration	original	original gc/malloc	1 thread	1 thread gc	1 thread gc/malloc
0	164	31	159	156	30
39	453	30	456	165	30

That is, memory doesn't increase even with multiple threads (after we clean garbage collect and release glibc cache).

Nov 07 '25 02:11 mathgeekcoder

My silly mistake. Fixing that helped replicate the issue with mps

My bad for laziness around error handling!

After export MALLOC_ARENA_MAX=1, adding an "original gc" column, and running on a physical Linux machine (not WSL)

iteration	original	original gc	original gc/malloc	1 thread	1 thread gc	1 thread gc/malloc
0	132	132	27	117	117	26
39	454	160	28	433	112	26

(That is, similar results.)

Did you manage to reproduce this in C++?

And just to make sure, is the following summary/interpretation of our findings so far correct?

Although it seems to make no difference in our tests, there's a cyclic dependency highspy is creating that might stop Python from freeing memory in some cases. (For my own interest, could you point me to the relevant line(s)?)
Otherwise, it looks like most of the symptoms in our tests are coming from Python and glibc management of memory that is "in theory" available.

Nov 07 '25 10:11 BenChampion

(That is, similar results.)

Thanks for confirming @BenChampion!

Did you manage to reproduce this in C++?

Yes.

Although it seems to make no difference in our tests, there's a cyclic dependency highspy is creating that might stop Python from freeing memory in some cases. (For my own interest, could you point me to the relevant line(s)?)

Yes: HighsCallback.highs. Instead of pointing directly to the relevant highs object, it probably should use weakref.ref(highs). There's also HighspyArray.highs, though only the callback has the cyclic dependency. That said, this cyclic dependency issue could be avoided if the user calls clearCallbacks etc. once they're done - but that's not particularly nice.

Otherwise, it looks like most of the symptoms in our tests are coming from Python and glibc management of memory that is "in theory" available.

That's my understanding too. It's not really a bug, but it's using more memory than people might expect.

This might also be the original issue, but I'd imagine the python garbage collector and glibc clean-up kicking in before you run out of memory. It's not a memory leak. It's a side-effect of our threading model and glibc allocator.

There's a few things we could do if we wanted to avoid this behaviour. We could limit the glibc arena programmatically, perhaps we could reconsider how we're doing work-stealing etc. on our multiple threads, or we could use a different malloc allocator, there's potential performance benefits to that too (see #2476).

FYI: I've tried a few of them via my C++ test, i.e., injecting the different allocators via:

LD_PRELOAD=lib*malloc.so ./highs

iterations	glibc	glibc ARENA=1	mimalloc	jemalloc	tcmalloc
0	163 MB	138 MB	209 MB	142 MB	218 MB
39	626 MB	148 MB	254 MB	203 MB	241 MB

One challenge is that the arena concept helps speed up allocations across multiple threads (so we probably want more than one), but we also have work stealing that might fragment the memory allocations across the different arena heaps. The other allocators have similar concepts, and we could possibly tune whatever one we wanted to fit our needs best.

Nov 07 '25 21:11 mathgeekcoder

Great to see @BenChampion and @mathgeekcoder sparking off each other to investigate this! 🤩

Nov 08 '25 00:11 jajhall

Yes: HighsCallback.highs. Instead of pointing directly to the relevant highs object, it probably should use weakref.ref(highs). There's also HighspyArray.highs, though only the callback has the cyclic dependency. That said, this cyclic dependency issue could be avoided if the user calls clearCallbacks etc. once they're done - but that's not particularly nice.

I can create a new issue for this. (I can also try making the required changes.)

This might also be the original issue, but I'd imagine the python garbage collector and glibc clean-up kicking in before you run out of memory.

I would have thought that too, but it does seem like glibc isn't always that smart and can still exhaust memory.

An example of this happening in practice in another project (avoiding the backlink, I hope, by inserting www.!)

In any case, I propose we close this issue for now since there doesn't appear to be an actual memory leak and the above investigations provide several potential workarounds for affected users.

Nov 10 '25 11:11 BenChampion

I can create a new issue for this. (I can also try making the required changes.)

Great! Should be fairly straightforward. I'm happy to review the PR.

I would have thought that too, but it does seem like glibc isn't always that smart and can still exhaust memory.

In any case, I propose we close this issue for now since there doesn't appear to be an actual memory leak and the above investigations provide several potential workarounds for affected users.

Yeah, I agree. That said, I think it'll be worth revisiting memory allocation and threading in the future for better performance and reduced memory overhead/fragmentation. Sounds rather fascinating, so I'll add that to my list of investigation TODOs :)

Nov 10 '25 14:11 mathgeekcoder