jeromq icon indicating copy to clipboard operation
jeromq copied to clipboard

Exception in thread "iothread-2" java.lang.OutOfMemoryError: GC overhead limit exceeded

Open gaoqi-code opened this issue 7 years ago • 9 comments

After using the ROUTER/DEALER mode for a long time, this error will be reported. How did this result?

Sudden memory increase after long operation

Exception in thread "iothread-2" java.lang.OutOfMemoryError: GC overhead limit exceeded at zmq.pipe.YQueue$Chunk.(YQueue.java:16) at zmq.pipe.YQueue.(YQueue.java:47) at zmq.pipe.YPipe.(YPipe.java:32) at zmq.pipe.Pipe.pair(Pipe.java:127) at zmq.io.SessionBase.processAttach(SessionBase.java:357) at zmq.ZObject.processCommand(ZObject.java:73) at zmq.Command.process(Command.java:75) at zmq.io.IOThread.inEvent(IOThread.java:80) at zmq.poll.Poller.run(Poller.java:273) at java.lang.Thread.run(Thread.java:748)

gaoqi-code avatar Mar 12 '18 05:03 gaoqi-code

Hello,

that's a bit too big an error to investigate without more information... It may come from so many sources...

I will ask some questions to try to narrow the focus:

  • did you profile the application to get an idea of the objects that are not GCed?
  • could you give us your configuration (Jeromq version, java version, OS, ...)
  • do you have a compilable example of code to demonstrate the increase of memory?
  • which components do you use in your application ? zmq, org.zeromq, sockets, zloop, zauth, ... ?

fredoboulo avatar Mar 12 '18 22:03 fredoboulo

jeromq 0.4.3 jdk1.8.0_151 CentOS Linux release 7.3.1611 (Core)

Use this way to run for 48 hours will have memory overflow

public static void main(String[] args) throws InterruptedException {

    ZContext context = new ZContext();
    context.getContext().setMaxSockets(65536);
    for (int i = 0; i < 500; i++) {
        new Thread(() -> {
            while (true) {
                ZMQ.Socket client = context.createSocket(ZMQ.REQ);
                client.connect("tcp://127.0.0.1:5000");
                client.send("test");
                client.setReceiveTimeOut(5000);
                String reply = client.recvStr();
                context.destroySocket(client);
            }
        }).start();
    }
    new Thread(() -> {
        ZContext ctx = new ZContext();
        ctx.setSndHWM(1000);
        ctx.setRcvHWM(1000);
        ZMQ.Socket frontend = ctx.createSocket(ZMQ.ROUTER);
        frontend.bind("tcp://*:5000");
        ZMQ.Socket backend = ctx.createSocket(ZMQ.DEALER);
        backend.bind("tcp://*:5001");
        ZMQ.proxy(frontend, backend, null);
        ctx.destroy();
    }).start();

    new Thread(() -> {
        ZContext context1 = new ZContext();
        ZMQ.Socket server = context1.createSocket(ZMQ.REP);
        server.connect("tcp://127.0.0.1:5001");
        try {
            while (true) {
                byte[] request = server.recv();
                server.send("result");
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        context1.close();
    }).start();

gaoqi-code avatar Mar 15 '18 06:03 gaoqi-code

One instance of "zmq.Mailbox" loaded by "org.springframework.boot.loader.LaunchedURLClassLoader @ 0x700009090" occupies 3,149,183,984 (99.71%) bytes. The memory is accumulated in one instance of "zmq.pipe.YQueue" loaded by "org.springframework.boot.loader.LaunchedURLClassLoader @ 0x700009090".

Keywords org.springframework.boot.loader.LaunchedURLClassLoader @ 0x700009090 zmq.pipe.YQueue zmq.Mailbox

gaoqi-code avatar Mar 15 '18 07:03 gaoqi-code

1

gaoqi-code avatar Mar 15 '18 07:03 gaoqi-code

I also encountered similar problems

emanon-k avatar Mar 15 '18 10:03 emanon-k

Hi,

I got hard time and not enough bandwidth for that... I found some interesting discoveries through your SSCCE, which is indeed short but involves a whole set of complexity.

I tried it amongst different versions of Jeromq, and it fails up to 0.3.3. I did not dig deeper in the versions, but that's already a thing.

I could diagnose that the OOME occurs in the proxy, not the clients or the worker.

Here is my understanding of the situation, I may not be accurate and there could be several different other reasons, but:

The worker (REP socket) gets accumulation of received messages, up to the point where it is not possible to send any more messages by the DEALER. Then it is not possible to write messages to DEALER. In the mean time, messages are received by the ROUTER socket. As one message only is sent per connected REQ socket, one chunk is allocated in the queue to hold the message. Per connected socket. They accumulate over and over as they cannot be transferred to the DEALER. And .... OOME.

Does that make sense?

Anyhow, is that representative of your code? It looks like adding more workers would be a suitable help, especially given the number of clients hammering the proxy with short-lived connections.

fredoboulo avatar Apr 06 '18 21:04 fredoboulo

I tried to add more workers, but it it doesn't work. there is a problem. For example, a worker could be up to 5000 per second, but two workers, each 2500, can't change anything. I will try the latest code and see how it can work......

emanon-k avatar Apr 13 '18 01:04 emanon-k

@learingL Is it solved? I have encountered the same problem.

I sent a lot of messages on the ZMQ.PUB, directly causing OutOfMemory

image

xjsunup avatar Aug 20 '19 11:08 xjsunup

I have been trying to work around this issue with no success using 0.4.3. When I create the socket I set a timeout, linger time, and a HWM.

this._zmqContext = new ZContext(1);
this._zmqContext.setLinger(RECIEVE_TIMEOUT_MS);
this._zmqContext.setRcvHWM(HWM_MESSAGES);
    
this._socket = this._zmqContext.createSocket(ZMQ.SUB);
this._socket.setReceiveTimeOut(RECIEVE_TIMEOUT_MS);

When I don't read from the socket fast enough I eventually get an exception from an iothread.

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "iothread-2"
java.lang.OutOfMemoryError: Java heap space

Why is this happening if I set the HWM? Next, I try to close the context so I set linger to zero.

this._zmqContext.setLinger(0);
this._zmqContext.close();

The close() hangs indefinitely similar to https://github.com/zeromq/jeromq/issues/543. This is probably because it is waiting for a socket to close that never will because the thread died.

My best guess is that something in the iothread continues to consume heap even after the RcvHWM is reached. Is there something I could be doing wrong?

Update: Here is where the close hangs.

"thread" #25 prio=5 os_prio=0 cpu=127100.79ms elapsed=6919.25s tid=0x00007ff81098d000 nid=0x2f06 runnable  [0x00007ff7c16b7000]
java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPoll.wait([email protected]/Native Method)
    at sun.nio.ch.EPollSelectorImpl.doSelect([email protected]/EPollSelectorImpl.java:120)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect([email protected]/SelectorImpl.java:124)
    - locked <0x00000007040b3cb0> (a sun.nio.ch.Util$2)
    - locked <0x00000007040b39c0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select([email protected]/SelectorImpl.java:136)
    at zmq.Signaler.waitEvent(Signaler.java:130)
    at zmq.Mailbox.recv(Mailbox.java:90)
    at zmq.Ctx.terminate(Ctx.java:249)
    at org.zeromq.ZMQ$Context.term(ZMQ.java:357)
    at org.zeromq.ZContext.destroy(ZContext.java:108)
    at org.zeromq.ZContext.close(ZContext.java:315)

chadj2 avatar Aug 18 '20 16:08 chadj2