jeromq Exception in thread "iothread-2" java.lang.OutOfMemoryError: GC overhead limit exceeded

After using the ROUTER/DEALER mode for a long time, this error will be reported. How did this result?

Sudden memory increase after long operation

Exception in thread "iothread-2" java.lang.OutOfMemoryError: GC overhead limit exceeded at zmq.pipe.YQueue$Chunk.(YQueue.java:16) at zmq.pipe.YQueue.(YQueue.java:47) at zmq.pipe.YPipe.(YPipe.java:32) at zmq.pipe.Pipe.pair(Pipe.java:127) at zmq.io.SessionBase.processAttach(SessionBase.java:357) at zmq.ZObject.processCommand(ZObject.java:73) at zmq.Command.process(Command.java:75) at zmq.io.IOThread.inEvent(IOThread.java:80) at zmq.poll.Poller.run(Poller.java:273) at java.lang.Thread.run(Thread.java:748)

Mar 12 '18 05:03 gaoqi-code

Hello,

that's a bit too big an error to investigate without more information... It may come from so many sources...

I will ask some questions to try to narrow the focus:

did you profile the application to get an idea of the objects that are not GCed?
could you give us your configuration (Jeromq version, java version, OS, ...)
do you have a compilable example of code to demonstrate the increase of memory?
which components do you use in your application ? zmq, org.zeromq, sockets, zloop, zauth, ... ?

Mar 12 '18 22:03 fredoboulo

jeromq 0.4.3 jdk1.8.0_151 CentOS Linux release 7.3.1611 (Core)

Use this way to run for 48 hours will have memory overflow

public static void main(String[] args) throws InterruptedException {

    ZContext context = new ZContext();
    context.getContext().setMaxSockets(65536);
    for (int i = 0; i < 500; i++) {
        new Thread(() -> {
            while (true) {
                ZMQ.Socket client = context.createSocket(ZMQ.REQ);
                client.connect("tcp://127.0.0.1:5000");
                client.send("test");
                client.setReceiveTimeOut(5000);
                String reply = client.recvStr();
                context.destroySocket(client);
            }
        }).start();
    }
    new Thread(() -> {
        ZContext ctx = new ZContext();
        ctx.setSndHWM(1000);
        ctx.setRcvHWM(1000);
        ZMQ.Socket frontend = ctx.createSocket(ZMQ.ROUTER);
        frontend.bind("tcp://*:5000");
        ZMQ.Socket backend = ctx.createSocket(ZMQ.DEALER);
        backend.bind("tcp://*:5001");
        ZMQ.proxy(frontend, backend, null);
        ctx.destroy();
    }).start();

    new Thread(() -> {
        ZContext context1 = new ZContext();
        ZMQ.Socket server = context1.createSocket(ZMQ.REP);
        server.connect("tcp://127.0.0.1:5001");
        try {
            while (true) {
                byte[] request = server.recv();
                server.send("result");
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        context1.close();
    }).start();

｝

Mar 15 '18 06:03 gaoqi-code

One instance of "zmq.Mailbox" loaded by "org.springframework.boot.loader.LaunchedURLClassLoader @ 0x700009090" occupies 3,149,183,984 (99.71%) bytes. The memory is accumulated in one instance of "zmq.pipe.YQueue" loaded by "org.springframework.boot.loader.LaunchedURLClassLoader @ 0x700009090".

Keywords org.springframework.boot.loader.LaunchedURLClassLoader @ 0x700009090 zmq.pipe.YQueue zmq.Mailbox

Mar 15 '18 07:03 gaoqi-code

I also encountered similar problems

Mar 15 '18 10:03 emanon-k

Hi,

I got hard time and not enough bandwidth for that... I found some interesting discoveries through your SSCCE, which is indeed short but involves a whole set of complexity.

I tried it amongst different versions of Jeromq, and it fails up to 0.3.3. I did not dig deeper in the versions, but that's already a thing.

I could diagnose that the OOME occurs in the proxy, not the clients or the worker.

Here is my understanding of the situation, I may not be accurate and there could be several different other reasons, but:

The worker (REP socket) gets accumulation of received messages, up to the point where it is not possible to send any more messages by the DEALER. Then it is not possible to write messages to DEALER. In the mean time, messages are received by the ROUTER socket. As one message only is sent per connected REQ socket, one chunk is allocated in the queue to hold the message. Per connected socket. They accumulate over and over as they cannot be transferred to the DEALER. And .... OOME.

Does that make sense?

Anyhow, is that representative of your code? It looks like adding more workers would be a suitable help, especially given the number of clients hammering the proxy with short-lived connections.

Apr 06 '18 21:04 fredoboulo

I tried to add more workers, but it it doesn't work. there is a problem. For example, a worker could be up to 5000 per second, but two workers, each 2500, can't change anything. I will try the latest code and see how it can work......

Apr 13 '18 01:04 emanon-k

@learingL Is it solved? I have encountered the same problem.

I sent a lot of messages on the ZMQ.PUB, directly causing OutOfMemory

Aug 20 '19 11:08 xjsunup

I have been trying to work around this issue with no success using 0.4.3. When I create the socket I set a timeout, linger time, and a HWM.

this._zmqContext = new ZContext(1);
this._zmqContext.setLinger(RECIEVE_TIMEOUT_MS);
this._zmqContext.setRcvHWM(HWM_MESSAGES);
    
this._socket = this._zmqContext.createSocket(ZMQ.SUB);
this._socket.setReceiveTimeOut(RECIEVE_TIMEOUT_MS);

When I don't read from the socket fast enough I eventually get an exception from an iothread.

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "iothread-2"
java.lang.OutOfMemoryError: Java heap space

Why is this happening if I set the HWM? Next, I try to close the context so I set linger to zero.

this._zmqContext.setLinger(0);
this._zmqContext.close();

The close() hangs indefinitely similar to https://github.com/zeromq/jeromq/issues/543. This is probably because it is waiting for a socket to close that never will because the thread died.

My best guess is that something in the iothread continues to consume heap even after the RcvHWM is reached. Is there something I could be doing wrong?

Update: Here is where the close hangs.

"thread" #25 prio=5 os_prio=0 cpu=127100.79ms elapsed=6919.25s tid=0x00007ff81098d000 nid=0x2f06 runnable  [0x00007ff7c16b7000]
java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPoll.wait([email protected]/Native Method)
    at sun.nio.ch.EPollSelectorImpl.doSelect([email protected]/EPollSelectorImpl.java:120)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect([email protected]/SelectorImpl.java:124)
    - locked <0x00000007040b3cb0> (a sun.nio.ch.Util$2)
    - locked <0x00000007040b39c0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select([email protected]/SelectorImpl.java:136)
    at zmq.Signaler.waitEvent(Signaler.java:130)
    at zmq.Mailbox.recv(Mailbox.java:90)
    at zmq.Ctx.terminate(Ctx.java:249)
    at org.zeromq.ZMQ$Context.term(ZMQ.java:357)
    at org.zeromq.ZContext.destroy(ZContext.java:108)
    at org.zeromq.ZContext.close(ZContext.java:315)

Aug 18 '20 16:08 chadj2

jeromq jeromq copied to clipboard

Exception in thread "iothread-2" java.lang.OutOfMemoryError: GC overhead limit exceeded

jeromq
jeromq copied to clipboard