netmq icon indicating copy to clipboard operation
netmq copied to clipboard

Possible memoryleak in ResponseSocket

Open mvburgh opened this issue 6 years ago • 30 comments

image

my code is already minimalistic; only a RequestSocket and ResponseSocket. I'm creating a RequestSocket per few seconds based on the requests from a Web Controller

Originally posted by @mvburgh in https://github.com/zeromq/netmq/issues/737#issuecomment-471332420

mvburgh avatar Mar 28 '19 18:03 mvburgh

Any one working on it?

KamranShahid avatar Mar 29 '19 14:03 KamranShahid

Not me personally. I had a quick glance at the code but could not see any possible cause in the meanwhile.

mvburgh avatar Apr 24 '19 07:04 mvburgh

image

@mvburgh Did you made any progress since then on that?

Svisstack avatar Aug 23 '19 08:08 Svisstack

I will take a look next week, on vacation this week.

Code that reproduce this will help

somdoron avatar Aug 23 '19 09:08 somdoron

Also @svisstack, can you check with socket.Options.Linger set to zero? I suspect it might be the issue.

somdoron avatar Aug 23 '19 09:08 somdoron

image

Code which causing this is very simple just the PublisherSocker who had connected ~10 subscribers:

image

I'm not sure it's the same bug as the initial bug was related to the ResponseSocket.

On the Subscriber side, this bug does not exist.

From the memory dumps, we can see that there are probably too many Pub+PubSession objects and along with that Pipe, YPipe, YQueue, but all the memory is allocated on the YQueue+Chunk<Msg>

Svisstack avatar Aug 23 '19 09:08 Svisstack

@somdoron I confirm that Linger is equal to the {00:00:00} at the end of the Start() function in the snippet provided above @(Start(): return port;)

publisher.Options {NetMQ.SocketOptions} Affinity: 0 Backlog: 100 DelayAttachOnConnect: false DisableTimeWait: false Endian: Big IPv4Only: true Identity: null LastEndpoint: "tcp://0.0.0.0:61584" LastPeerRoutingId: null Linger: {00:00:00} MaxMsgSize: -1 MulticastHops: 1 MulticastRate: 100 MulticastRecoveryInterval: {00:00:10} PgmMaxTransportServiceDataUnitLength: 'publisher.Options.PgmMaxTransportServiceDataUnitLength' threw an exception of type 'NetMQ.InvalidException' ReceiveBuffer: 0 ReceiveHighWatermark: 1000 ReceiveLowWatermark: 0 ReceiveMore: false ReconnectInterval: {00:00:00.1000000} ReconnectIntervalMax: {00:00:00} SendBuffer: 0 SendHighWatermark: 0 SendLowWatermark: 0 TcpKeepalive: false TcpKeepaliveIdle: {-00:00:00.0010000} TcpKeepaliveInterval: {-00:00:00.0010000}

Svisstack avatar Aug 23 '19 10:08 Svisstack

Do the subscribers come and go frequently? It seems like linger set to zero or few seconds will solve

somdoron avatar Aug 23 '19 10:08 somdoron

Thanks, does the subscribers come and go? Can you check who is referencing the PubSession?

somdoron avatar Aug 23 '19 10:08 somdoron

@somdoron Take a look at the incoming reference chart.

image

In my use-case, the subscribers should not come and go frequently, but there could be a bug on my side causing the come and go and I analyzing that at the moment.

Svisstack avatar Aug 23 '19 10:08 Svisstack

I'm using the 4.0.0.239-pre version.

Svisstack avatar Aug 23 '19 10:08 Svisstack

Can you send me the report? Which application are you using?

somdoron AT gmail DOT com

I'm not in front of a computer this week, but I will take a look beginning of next week.

somdoron avatar Aug 23 '19 10:08 somdoron

@somdoron No problem, I actually found interesting fact - the leak is visible only on nodes on which there is no communication activity between Publisher and Subscriber (silence), it's ok from the application perspective.

Svisstack avatar Aug 23 '19 10:08 Svisstack

Can you extend this list:

https://user-images.githubusercontent.com/864295/63584247-d162e480-c59c-11e9-8480-9f4bd1532964.png

I want to see the root object causing the memory leak

somdoron avatar Aug 23 '19 10:08 somdoron

Also, can you show the incoming reference to the pipe class?

somdoron avatar Aug 23 '19 10:08 somdoron

image

It looks like the Pipe is also referenced to the Pub+Sub, however, I don't know it's the same instance.

Svisstack avatar Aug 23 '19 10:08 Svisstack

image

@somdoron paths to the root.

Svisstack avatar Aug 23 '19 10:08 Svisstack

Funny, I just figured it out myself.

At least in this case it is not a bug.

Once one message will be sent everything will be freed.

From the memory picture I saw pending command holding the reference and causing the issue.

To avoid the issue you can call once in a while the socket.Poll with zero timespan. This will also process pending commands.

Anyway, I think you have a case where subscribers come and go frequently.

somdoron avatar Aug 23 '19 11:08 somdoron

Thanks. @somdoron, I appreciate the effort and in-depth knowledge of this project.

Have a nice time on the vacations.

Yes, I could have the come and go issue looking at the netstat.

Svisstack avatar Aug 23 '19 11:08 Svisstack

Do the subscribers come and go frequently? It seems like linger set to zero or few seconds will solve

In my case they come and go every few seconds as they are web api requests.

mvburgh avatar Aug 23 '19 14:08 mvburgh

@mvburgh, i will try to reproduce next week. Only request response sockets? Are you using a proxy? Do you happen to have memory profiler report?

somdoron avatar Aug 23 '19 15:08 somdoron

No proxy here; it runs between a windows service and website for me. I dont have a profile report at hand.

mvburgh avatar Aug 24 '19 14:08 mvburgh

I have majordomo pattern implemented with broker in one windows service (.net core 2.1) and worker app resides on another windows service (.net core 2.1). https://github.com/NetMQ/Samples/tree/master/src/Majordomo
In worker windows service there are different 16/17 type of workers . each type of worker can have multiple instances. What i were seeing is when i am assigning 10 number of worker against each type my broker application memory increases time to time.

It probably is due to default heartbeat time. Now I am trying setting default heartbeat time at worker side as 10 seconds while on broker 15 seconds.

Memory profiling is bit difficult in my case as i have setup workers and broker in different applications for future scalability perspective

KamranShahid avatar Sep 14 '19 07:09 KamranShahid

@ReneOlsthoorn during the time the memory increase to 3Gb are you still sending messages? can it be that it happens only during silence times?

somdoron avatar Sep 15 '19 11:09 somdoron

Can you share the test program?

On Sun, Sep 15, 2019, 17:48 ReneOlsthoorn [email protected] wrote:

@somdoron https://github.com/somdoron Yes, the server keeps running and messages are send. During silence times no memory is increased. The memory increase is gradually, every 4 hours one Gb. It depends how much consumers are connecting and disconnecting. I've made a test-program which connects and disconnects. The memoryleak is visible there as well. I've cloned the git sources, so maybe I can see where the problem is.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zeromq/netmq/issues/788?email_source=notifications&email_token=AAUW75RLBHVWFMEALWGGA53QJZDNVA5CNFSM4HCDUG52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XSIPY#issuecomment-531571775, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUW75WV64U4IVSIJF56JBDQJZDNVANCNFSM4HCDUG5Q .

somdoron avatar Sep 15 '19 14:09 somdoron

Can you share a memory profiler snapshot? that will help alot

somdoron avatar Sep 16 '19 05:09 somdoron

Doron and others, the memory-leak I was investigating was in our own product. My apologies for posting when it was not clear where the problem came from. I've deleted my comments, so new users don't get a wrong impression about NetMQ. Keep up the good work!

ReneOlsthoorn avatar Sep 18 '19 06:09 ReneOlsthoorn

@somdoron I have spent some more time with this last week, but both setting the linger to 0 and the socket.Poll() every now and then give no better result. The increase stays in YQueue+Chunk and does not get freed over time.

mvburgh avatar Dec 12 '19 10:12 mvburgh

We are also experiencing something similar in our app. We do not see a leak when we have one server and one client communicating via Request/Response sockets. However, if another client tries to connect to the server while it is already serving another client, the server will leak memory. The way we have it work is that the server can only serve one client, so when a new client connects, it sends a message to tell the client that it cannot communicate and that's pretty much it. Once the client receives that message it disconnects.

manu-st avatar Nov 18 '20 07:11 manu-st

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

stale[bot] avatar Apr 17 '22 03:04 stale[bot]