jeromq icon indicating copy to clipboard operation
jeromq copied to clipboard

EmptySlots leak

Open Verdoso opened this issue 8 years ago • 8 comments

Hi, We are having an issue where when we stress a JeroMQ Context with simultenous requests, it seems that the emptySlots deque is not being replenished correctly, so you end up running out of empty slots and from then on, trying to create any kind of socket results in an IllegalStateException("EMFILE").

I've donwloaded the latest code form the repo and I can see the emptySlots deque does not recover all the slots, but I'm not sure Im using it the library correcly or it is a bug.

My client creates a commons GenericObjectPool of ZContext/Sockets and creates each pair in the pool with

final ZContext shadowed = ZContext.shadow(context);
final ZMQ.Socket socket = shadowed.createSocket(ZMQ.REQ);
socket.setLinger(0);
socket.connect(address);

When we stress the client, so many objects in the pool are created and then discarded because they go over the "iddle" limit or we have timeouts from the server, the pairs are "destroyed" like this:

shadowed.destroySocket(socket);
shadowed.close();

The destroySocket does not call directly Ctx.destroySocket but leaves this job to the Reaper, and some times it is not calling the Ctx.destroySocket, due to some conditions not being met in Own.checkTermAcks(), so you begin to slowly lose emptySlots.

Again, I'm not sure if the problem is the way we are creating/destroying the sockets, but seeing that processedSeqnum, sendSeqnum and termAcks are not synced in any way and the Reaper thread and the threads reading from the sockets might check/modify them simultaneously, I was wondering if it might be a race condition showing its ugly face.

Thanks, D.

Verdoso avatar Jul 11 '16 10:07 Verdoso

I've never had any success sharing a socket between two (or more) threads. Which, I'm assuming you're doing using that generic object pool.

ptwohig avatar May 15 '19 16:05 ptwohig

If you look at the source for Socket, I think the error state is handed through a ThreadLocal and there's not really any logic in there to transfer that from one thread to another. I'm pretty sure Sockets are pretty tightly coupled to the thread that creates them.

ptwohig avatar May 15 '19 16:05 ptwohig

No, there was no sharing of sockets between pools. That's what the pool was there for, to have a number of sockets open to be able to do work in different threads without sharing them.

I'm pretty convinced that it was a race condition in the closing functionality, but I was not given the chance to play with it. As there was not even the ackowledgement of a bug, much less the prospect of a fix, the customer decided to skip ZeroMQ altogether and I had to drop it :).

Verdoso avatar May 15 '19 16:05 Verdoso

No, there was no sharing of sockets between pools.

At the same time, or ever?

ptwohig avatar May 15 '19 16:05 ptwohig

'cause I tried doing something like this:

  1. Thread 1: Checks a java.util.ConcurrentQueue (the non-blocking variety), if empty create a socket
  2. Thread 1: Perform Work using Socket
  3. Thread 1: Replace socket for the next Thread to use
  4. Thread 2: Dequeue Socket from ConcurrentQueue
  5. Thread 2: Repeat Steps 1-3

Never used Commons ObjectPool, but seems similar. The issue I had was that while the code guaranteed only one thread at a time was using the Socket, internally the Socket was making a lot of assumptions about always being used by the same thread. Messages simply got lost in the shuffle.

I asked about it, I forget where, and the response was more or less, "Sharing a Socket among threads is an anti-pattern." And that was pretty much the end of that conversation.

ptwohig avatar May 15 '19 16:05 ptwohig

No sharing of sockets between threads, simultaneously. Reusing the socket on a different thread should be no issue. That was what we were doing all the time.

The issue appeared when the network became flaky and we had to close sockets and create them anew. ZeroMq would not count correctly the number of sockets closed and would complain about "too many sockets open". JeroMq uses a counter to prevent you from using too many sockets and kill the system, which is good, but that counter is not protected from race conditions, which is bad. Hence the problem.

Verdoso avatar May 15 '19 16:05 Verdoso

Reusing the socket on a different thread should be no issue. That was what we were doing all the time.

Yeah, that's where I ran into issues :-/

ptwohig avatar May 15 '19 18:05 ptwohig

JeroMq uses a counter to prevent you from using too many sockets and kill the system, which is good, but that counter is not protected from race conditions, which is bad. Hence the problem.

~Did you know about this? Following this solved lots of issues for me: https://github.com/zeromq/jeromq/wiki/Sharing-ZContext-between-thread~

I'm an idiot, it's in your original report.

I may also revisit my connection pool approach if you've had success with it.

ptwohig avatar May 15 '19 18:05 ptwohig