ZMQ.jl icon indicating copy to clipboard operation
ZMQ.jl copied to clipboard

Apparent memory leak running publisher weather update example

Open ReubenHill opened this issue 7 years ago • 12 comments

I get what looks like a big memory leak when running the weather update publisher from the pub/sub example (server code, client code, demo explanation on ZMQ website).

I had to update a few lines in the subscriber to make it work with 0.5.x but this shouldn't affect the behaviour of the server. This is below.

module testsubscriber

using ZMQ

export runtestsubscriber

function runtestsubscriber(port=5556::Integer)

  context = Context()
  socket = Socket(context, SUB)

  println("Collecting updates from weather server...")
  ZMQ.connect(socket, "tcp://localhost:$port")

  # Subscribe to zipcode, default is NYC, 10001
  zip_filter = 10001

  ZMQ.set_subscribe(socket, string(zip_filter))

  # Process 5 updates
  update_nbr = 5

  total_temp = 0
  for update in [1:update_nbr]
      message = unsafe_string(ZMQ.recv(socket))
      zipcode, temperature, relhumidity = split(message)
      total_temp += parse(temperature)
  end

  avg_temp = total_temp / update_nbr

  println("Average temperature for zipcode $zip_filter was $(avg_temp)F")

  # Making a clean exit.
  ZMQ.close(socket)
  ZMQ.close(context)

end

end # module testsubscriber

I'm something of a novice with Julia, though a colleague who has been using it extensively reports the same problem. Am I simply using ZMQ incorrectly or is there an issue with ZMQ.jl?

ReubenHill avatar May 25 '17 09:05 ReubenHill

I was running on windows 10 (x64) and my colleague on macOS.

ReubenHill avatar May 25 '17 09:05 ReubenHill

How are you detecting that it is a memory leak?

stevengj avatar May 26 '17 17:05 stevengj

Relatively crudely: memory usage in task manager goes from under 100 MB to the PC maximum within about 10 seconds. It may not be a memory leak but it certainly doesn't seem to be normal behaviour; I end up having to kill the Julia process in task manager.

After a bit of searching it looks as the the high water mark for the publisher needs to be set (see this stack overflow question). I've queried the high water marks for the publisher socket using ZMQ.get_rcvhwm and ZMQ.get_sndhwm - both return 1000. Using ZMQ.set_rcvhwm and ZMQ.set_sndhwm to change the value for either the publisher or subscriber doesn't seem to make any difference.

Am I missing something obvious? I'm only really getting going with ZMQ in general. It seems odd to me, however, that one of the base examples wouldn't just work out of the box.

ReubenHill avatar May 30 '17 09:05 ReubenHill

I haven't looked at that example, but it does seem strange to me.

Why do you think the HWM should matter? Shouldn't the default of 1000 messages be small enough not to run out of memory? These messages are only a few bytes each.

stevengj avatar May 30 '17 17:05 stevengj

I just tried it, and it looks like the memory problem is in the server code:

using ZMQ

context = Context()
socket = Socket(context, PUB)
ZMQ.bind(socket, "tcp://*:5556")


while true
    zipcode = @sprintf("%05d",rand(1:99999))
    temperature = rand(-80:135)
    relhumidity = rand(10:60)
    ZMQ.send(socket, "$zipcode $temperature $relhumidity")
end

ZMQ.close(socket)
ZMQ.close(context)

stevengj avatar May 30 '17 17:05 stevengj

It looks like the leak is eliminated by adding a yield() call in the while true loop of the publisher. Can you confirm?

ZMQ.jl by default sends string messages as zero-copy messages, which is safe since string data is immutable. We register a callback with libzmq to tell us when libzmq is done with the data, so that Julia can free it. Because this callback needs to be threadsafe, however, it only posts an event to Julia's (libuv) event loop, so that the data is released the next time Julia's event loop runs. Normally, the event loop runs whenever you e.g. do I/O. In the above code, however, the Julia event loop apparently never gets called, so the resources never get freed unless you add a yield (or println or some other I/O).

Really, the ZMQ.send function should be fixed to make sure that it always lets the event loop run. (I thought it was already doing this because it calls the libuv-base wait and notify functions, but apparently not.)

stevengj avatar May 30 '17 17:05 stevengj

@vtjnash, what should we be doing instead of notify(socket) in ZMQ.send in order to ensure that the message-free events get processed eventually?

stevengj avatar May 30 '17 17:05 stevengj

Since the julia tasks are cooperatively multithreaded, I think that while loop just needs to run yield explicitly. It feels a little odd for that example code to try to be demonstrating how to create a CPU-bound workload (no sleep calls) anyways.

vtjnash avatar May 30 '17 18:05 vtjnash

(I also wonder whether we should be using this complicated zero-copy method for sending strings anyways; for small strings, it might be faster just to make a copy?)

stevengj avatar May 30 '17 19:05 stevengj

@stevengj I can confirm that adding yield() after ZMQ.send(socket, "$zipcode $temperature $relhumidity") resolves the memory issue.

using ZMQ

context = Context()
socket = Socket(context, PUB)
ZMQ.bind(socket, "tcp://*:5556")


while true
    zipcode = @sprintf("%05d",rand(1:99999))
    temperature = rand(-80:135)
    relhumidity = rand(10:60)
    ZMQ.send(socket, "$zipcode $temperature $relhumidity")
    yield()
end

ZMQ.close(socket)
ZMQ.close(context)

ReubenHill avatar May 31 '17 09:05 ReubenHill

Could you submit a PR to fix the tutorial? Or maybe use sleep(0.1) instead?

stevengj avatar Jun 06 '17 17:06 stevengj

I've submitted a pull request using the changes here, see https://github.com/booksbyus/zguide/pull/697.

ReubenHill avatar Jun 07 '17 09:06 ReubenHill