ZMQ.jl icon indicating copy to clipboard operation
ZMQ.jl copied to clipboard

memory leak

Open StefanKarpinski opened this issue 9 years ago • 26 comments

Example code:

pid = getpid()
vsz(s) = println(s*split(open(readall,`ps -p $pid -o vsz`),"\n")[2])
vsz("Initial VSZ=")

using ZMQ
vsz("After loading ZMQ, my VSZ=")

ctx = Context()
socket = Socket(ctx, PUB)
ZMQ.bind(socket, "ipc:///tmp/testZMQ")

vsz("After setting up ZMQ, my VSZ=")
println("Sending")
for i = 1:10000000
    ZMQ.send(socket, "abcdefghijklmnopqrstuvwxyz")
    if i % 100000 == 0
        println("Sent $i messages")
        println("Length of gc_protect: $(length(ZMQ.gc_protect))")
        vsz("My current VSZ=")
    end
end
vsz("Final VSZ=")

The virtual size keeps growing endlessly.

StefanKarpinski avatar Mar 22 '15 11:03 StefanKarpinski

Cc: @tanmaykm @amitmurthy

ViralBShah avatar Mar 23 '15 09:03 ViralBShah

Seems to be fixed. Thank you, @Keno!

StefanKarpinski avatar Mar 23 '15 18:03 StefanKarpinski

Was this a problem with both 0.3 and 0.4?

tkelman avatar Mar 24 '15 08:03 tkelman

Yes.

ViralBShah avatar Mar 24 '15 09:03 ViralBShah

And after 791b5d4af2c2fb029e4a38b291726964a0515dcf in the package, 0.3 still leaks memory?

tkelman avatar Mar 24 '15 09:03 tkelman

@StefanKarpinski knows the details best about what had to be done on 0.3. Let's wait for him to chime in.

ViralBShah avatar Mar 24 '15 09:03 ViralBShah

I'm going to sleep. @staticfloat may be online for a little while, and can do anything necessary with binaries. If we decide to immediately backport the corresponding Julia commit and re-tag, I'd personally be in favor of leaving the 0.3.7 tag in place since who knows how many people have fetched it by now, and just go straight to 0.3.8.

tkelman avatar Mar 24 '15 09:03 tkelman

Yes, it can certainly wait a week or two for 0.3.8.

ViralBShah avatar Mar 24 '15 10:03 ViralBShah

I should have fixed this on both 0.3 and 0.4.

Keno avatar Mar 24 '15 14:03 Keno

So this original example still memory leaks – just much more slowly than before. Looking into the cause.

StefanKarpinski avatar Feb 12 '16 19:02 StefanKarpinski

Cc @tanmaykm, since you are also a heavy user of this package...

ViralBShah avatar Feb 15 '16 05:02 ViralBShah

This seems to be a bug upstream

nkottary avatar Feb 15 '16 09:02 nkottary

Can see the leak even with just open and close of sockets.

test code: https://gist.github.com/tanmaykm/8352059108c6b34f5ecf

tanmaykm avatar Feb 15 '16 10:02 tanmaykm

I see that leaks are present even after closing the context after calling doopenclose in the above script. Calling zmq_unbind for each bind prevents these. I've added it here.

This does not fix the bug.

nkottary avatar Feb 15 '16 13:02 nkottary

The original script to reproduce this leaks for a different reason – the calls to readall, but these scripts also shows a memory leak:

screen shot 2016-02-26 at 11 58 05 am
Julia Version 0.4.0
Commit 0ff703b* (2015-10-08 06:20 UTC)
Platform Info:
  System: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT NO_AFFINITY SANDYBRIDGE)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

StefanKarpinski avatar Feb 26 '16 17:02 StefanKarpinski

Similar leakage on OS X:

screen shot 2016-02-26 at 2 49 04 pm
Julia Version 0.4.4-pre+26
Commit 386d77b (2016-01-29 21:53 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) M-5Y71 CPU @ 1.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

StefanKarpinski avatar Feb 26 '16 19:02 StefanKarpinski

The only operation in the loop in this script is ZMQ.send(socket, "abcd"), so there's a leak in the code that creates the ZMQ message object and sends it. It seems highly dubious that ZMQ's send code having a memory leak, so I'm guessing this is about how we are creating message objects.

StefanKarpinski avatar Feb 26 '16 19:02 StefanKarpinski

I suspect it's due to the finalizer.

yuyichao avatar Feb 26 '16 19:02 yuyichao

Ah, good thought, @yuyichao!

StefanKarpinski avatar Feb 26 '16 19:02 StefanKarpinski

I'm trying to rebase and fix https://github.com/JuliaLang/julia/pull/13995 now ...

yuyichao avatar Feb 26 '16 20:02 yuyichao

Hmm, it seems that you are plotting the virtual address space size? It's not the most useful measure since you are mostly measuring the 8G gc memory pool. This also kind of means that the leak is not in the GC pool objects.....

yuyichao avatar Feb 27 '16 09:02 yuyichao

That's a fair point and I'm happy to measure something else, but this does reflect the impact of the program from the system's perspective – and it keeps using more and more resources while doing a very trivial loop.

StefanKarpinski avatar Feb 29 '16 20:02 StefanKarpinski

I agree, I just mean that the reason of the leak is a little strange since it's apparently not https://github.com/JuliaLang/julia/pull/13993 and isn't really fixed by https://github.com/JuliaLang/julia/pull/13995

yuyichao avatar Feb 29 '16 20:02 yuyichao

The leak maybe in libuv like https://github.com/JuliaLang/julia/issues/13529 probably is due to a libuv issue.

amitmurthy avatar Mar 01 '16 04:03 amitmurthy

Has anybody run massif on this?

Keno avatar Mar 01 '16 04:03 Keno

Hey all. So looking into things, people don't advise finalizers. They advise using https://docs.julialang.org/en/latest/manual/functions/#Do-Block-Syntax-for-Function-Arguments-1. This is recommended by Tim Holy: https://github.com/JuliaLang/julia/issues/11207#issuecomment-100469273

I'm thinking that we should not rely on finalizers. The issue is that lifetimes aren't strictly managed by scopes in Julia, things are garbage-collected. If lifetimes aren't managed by scopes, then finalizers could run whenever the gc is tuned to, after the scope closes. That's just how resource management is done in Julia. So while memory management could be handled by Julia's gc, sockets, contexts, and messages shouldn't be, because they could hang around until gc deigns to release them and this would result in resource leaks, specifically open threads and memory. This requires a bit of a redesign, obviously.

I can do it (actually I already did it in my own clone of ZMQ.jl). Would people be interested in this? The nice bit is that it really only involves removing a lot of code from ZMQ.jl, simplifying interfaces. It also makes the Julia bindings more in line with both ZMQ and Julia paradigms.

I am thinking we can remove the Julia bindings that relate to ZMQ contexts entirely. AFAICT there really is only one use case for having more than one ZMQ context: ZMQ being imported in multiple places. That can be accomplished by a global variable holding the context handle. Julia takes care of each import having its own global, and then we can hide contexts from the user nearly altogether. (If users really want to control contexts, we can make some package-level functions for that.)

Thoughts?

joelfrederico avatar Dec 07 '18 21:12 joelfrederico