ZMQ.jl
ZMQ.jl copied to clipboard
memory leak
Example code:
pid = getpid()
vsz(s) = println(s*split(open(readall,`ps -p $pid -o vsz`),"\n")[2])
vsz("Initial VSZ=")
using ZMQ
vsz("After loading ZMQ, my VSZ=")
ctx = Context()
socket = Socket(ctx, PUB)
ZMQ.bind(socket, "ipc:///tmp/testZMQ")
vsz("After setting up ZMQ, my VSZ=")
println("Sending")
for i = 1:10000000
ZMQ.send(socket, "abcdefghijklmnopqrstuvwxyz")
if i % 100000 == 0
println("Sent $i messages")
println("Length of gc_protect: $(length(ZMQ.gc_protect))")
vsz("My current VSZ=")
end
end
vsz("Final VSZ=")
The virtual size keeps growing endlessly.
Cc: @tanmaykm @amitmurthy
Seems to be fixed. Thank you, @Keno!
Was this a problem with both 0.3 and 0.4?
Yes.
And after 791b5d4af2c2fb029e4a38b291726964a0515dcf in the package, 0.3 still leaks memory?
@StefanKarpinski knows the details best about what had to be done on 0.3. Let's wait for him to chime in.
I'm going to sleep. @staticfloat may be online for a little while, and can do anything necessary with binaries. If we decide to immediately backport the corresponding Julia commit and re-tag, I'd personally be in favor of leaving the 0.3.7 tag in place since who knows how many people have fetched it by now, and just go straight to 0.3.8.
Yes, it can certainly wait a week or two for 0.3.8.
I should have fixed this on both 0.3 and 0.4.
So this original example still memory leaks – just much more slowly than before. Looking into the cause.
Cc @tanmaykm, since you are also a heavy user of this package...
This seems to be a bug upstream
Can see the leak even with just open and close of sockets.
test code: https://gist.github.com/tanmaykm/8352059108c6b34f5ecf
I see that leaks are present even after closing the context after calling doopenclose in the above script. Calling zmq_unbind for each bind prevents these. I've added it here.
This does not fix the bug.
The original script to reproduce this leaks for a different reason – the calls to readall, but these scripts also shows a memory leak:
Julia Version 0.4.0
Commit 0ff703b* (2015-10-08 06:20 UTC)
Platform Info:
System: Linux (x86_64-redhat-linux)
CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT NO_AFFINITY SANDYBRIDGE)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.3
Similar leakage on OS X:
Julia Version 0.4.4-pre+26
Commit 386d77b (2016-01-29 21:53 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) M-5Y71 CPU @ 1.20GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.3
The only operation in the loop in this script is ZMQ.send(socket, "abcd"), so there's a leak in the code that creates the ZMQ message object and sends it. It seems highly dubious that ZMQ's send code having a memory leak, so I'm guessing this is about how we are creating message objects.
I suspect it's due to the finalizer.
Ah, good thought, @yuyichao!
I'm trying to rebase and fix https://github.com/JuliaLang/julia/pull/13995 now ...
Hmm, it seems that you are plotting the virtual address space size? It's not the most useful measure since you are mostly measuring the 8G gc memory pool. This also kind of means that the leak is not in the GC pool objects.....
That's a fair point and I'm happy to measure something else, but this does reflect the impact of the program from the system's perspective – and it keeps using more and more resources while doing a very trivial loop.
I agree, I just mean that the reason of the leak is a little strange since it's apparently not https://github.com/JuliaLang/julia/pull/13993 and isn't really fixed by https://github.com/JuliaLang/julia/pull/13995
The leak maybe in libuv like https://github.com/JuliaLang/julia/issues/13529 probably is due to a libuv issue.
Has anybody run massif on this?
Hey all. So looking into things, people don't advise finalizers. They advise using https://docs.julialang.org/en/latest/manual/functions/#Do-Block-Syntax-for-Function-Arguments-1. This is recommended by Tim Holy: https://github.com/JuliaLang/julia/issues/11207#issuecomment-100469273
I'm thinking that we should not rely on finalizers. The issue is that lifetimes aren't strictly managed by scopes in Julia, things are garbage-collected. If lifetimes aren't managed by scopes, then finalizers could run whenever the gc is tuned to, after the scope closes. That's just how resource management is done in Julia. So while memory management could be handled by Julia's gc, sockets, contexts, and messages shouldn't be, because they could hang around until gc deigns to release them and this would result in resource leaks, specifically open threads and memory. This requires a bit of a redesign, obviously.
I can do it (actually I already did it in my own clone of ZMQ.jl). Would people be interested in this? The nice bit is that it really only involves removing a lot of code from ZMQ.jl, simplifying interfaces. It also makes the Julia bindings more in line with both ZMQ and Julia paradigms.
I am thinking we can remove the Julia bindings that relate to ZMQ contexts entirely. AFAICT there really is only one use case for having more than one ZMQ context: ZMQ being imported in multiple places. That can be accomplished by a global variable holding the context handle. Julia takes care of each import having its own global, and then we can hide contexts from the user nearly altogether. (If users really want to control contexts, we can make some package-level functions for that.)
Thoughts?