jeromq
jeromq copied to clipboard
Tcp connections left in Established state and not cleared up.
I have a reasonably standard router dealer set up. The server is using router sockets though and many thousands of clients connect using dealer sockets with a pre agreed identity. This way I can route messages through to individual clients from the server. So far so good. But I've found that if I cut off the clients by killing the process or pulling out the network cable some of the sockets are left open on the server (not of all them but a few) in the established state. I can't figure a decent way of getting to the underlying sockets to shut them down so it seems we have a resource leak?
I've added some debug and can see StreamEngine.unplug() is being called when I disconnect the clients so is it possible the handle isn't always being closed? In fact I guess that's deliberate, in case the client reconnects? Although still it's clear in the majority of cases the client socket is closed. I haven't ensured the number of calls to unplug is the same as the number of clients disconnecting. I imagine that's also a possibility. I'll need to debug that further to be sure.
Well OK. I've found a solution but it's not particularly satisfactory. It means setting tcp keepalive at the system level and making sure zmq is using tcp keepalive as well. This is OK as at least the connections get cleared up. My app still has no idea the client has gone away without heartbeats. I guess it comes down to this. What strategies do people use when you need to heartbeat 10000 or indeed 100000 clients as the heartbeats themselves quickly start using all available processor time? You certainly can't heartbeat 100000 devices every second. Or can you? Does anyone have experience of this?