replication-manager icon indicating copy to clipboard operation
replication-manager copied to clipboard

Memory leak when slave server unavailable

Open preffect opened this issue 6 years ago • 10 comments

When running replication-manager as a service there appears to be a memory leak when the slave server is not available. I can easily reproduce this and see a consistent ~6MB / hour and have seen it get as high as 1GB total.

Note. I am aware there is a memory leak in http-auth, but i have both that and the http-server turned off.

Is this a known issue? Is there anything thing else I can add to this to help track it down?

Mariadb version: 10.1.22 Replication manager version: 2.0.0-11-gc3654c71 OS: CentOS Linux release 7.3.1611 (Core)

TOPOLOGY CONFIG

--------

db-servers-hosts = "127.0.0.1:3306,remove.server.com:3306" db-servers-credential = "user:pass" replication-credential = "slave_user:pass" db-servers-connect-timeout = 1 db-servers-prefered-master = "127.0.0.1:3306"

HTTP

-------

http-server = false http-bind-address = "0.0.0.0" http-port = "10001" http-root = "/usr/share/replication-manager/dashboard" http-auth = false http-session-lifetime = 3600 http-bootstrap-button = false

preffect avatar May 10 '18 16:05 preffect

We are not aware of such issues, we'll try to reproduce it ASAP. Thanks!

tanji avatar May 10 '18 19:05 tanji

Hi,

Can you try to reproduce with last 2.0 build. If you can still reproduce with the last build than you can send me 2 extract of
http://127.0.0.1:10001/debug/pprof/heap at different time (wating for the memory to grow before the second extract)

tx /svar

svaroqui avatar May 11 '18 13:05 svaroqui

also issue with Auth is removed in 2.1 by using the https://server:10005/ witch is true secured JWT login

svaroqui avatar May 11 '18 13:05 svaroqui

Hi svar,

I've upgraded to 2.0.0-21-g6c87cf3f and the memory leak still exists. I've attached 3 heap logs, each roughly an hour apart.

I've also attached a running counter of memory used by the replication-manager-osc monitor. As you can see the monitor memory usage increases in chunks of roughly 1MB every 10 minutes.

heap_2018.05.11_11:52.gz heap_2018.05.11_10:35.gz heap_2018.05.11_09:19.gz replication-manager-memory.log

preffect avatar May 11 '18 18:05 preffect

Re,

I'm not able to reproduce yet event trying hard on centos . Can you send us the content of the file clusterstate.json under /var/lib/replication/cluster-name/

tx /svar

svaroqui avatar May 12 '18 10:05 svaroqui

Doesn't look like there's anything terribly interesting...

{
	"servers": "127.0.0.1:3306,remote.server.com:3306",
	"crashes": null,
	"sla": {
		"firsttime": 1523297297,
		"uptime": 0,
		"uptimeFailable": 0,
		"uptimeSemisync": 0
	}
}

On a positive note, I've been running this on 40 other server pairs for a month now with no issues. It definitely seems to be in issue only with servers that can't connect to their failover. Actually, the two servers I'm seeing the memory leak on we accidentally setup with replication-manager even though they don't have failover servers. Maybe the leak is something that only happens during the initial setup?

preffect avatar May 15 '18 16:05 preffect

I’m testing a patch on 2.1 when ready would you like to test the 2.1 to see if that can be reproductible as well !

Le 15 mai 2018 à 18:29, Conan Morris [email protected] a écrit :

Doesn't look like there's anything terribly interesting...

{ "servers": "127.0.0.1:3306,remote.server.com:3306", "crashes": null, "sla": { "firsttime": 1523297297, "uptime": 0, "uptimeFailable": 0, "uptimeSemisync": 0 } } On a positive note, I've been running this on 40 other server pairs for a month now with no issues. It definitely seems to be in issue only with servers that can't connect to their failover. Actually, the two servers I'm seeing the memory leak on we accidentally setup with replication-manager even though they don't have failover servers. Maybe the leak is something that only happens during the initial setup?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/231#issuecomment-389230076, or mute the thread https://github.com/notifications/unsubscribe-auth/AC1RIB5sZXidQxF6bDJUxaEi7xiYPGmNks5tywJrgaJpZM4T6Np8.

Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/

svaroqui avatar May 15 '18 17:05 svaroqui

Sure. Will it be released soon? I'll watch for it.

preffect avatar May 15 '18 22:05 preffect

Got some news ! I possibly found a cause about this issue ! If you get time to confirm hat using a non existing ip address instead of a hostname fix the leak , this would validate my founding !

I'll work for on a fix and let you know when available !

svaroqui avatar May 24 '18 10:05 svaroqui

I tried setting the replication slave to an IP that wan't assigned to anything, but I still see the memory leak.

I was able to confirm that the leak also occurs on a server pair that was fully setup. I shut down one of my slave servers for a little over 1/2 an hour, and memory used went from 31468K to 34636K in three ~1MB steps.

After turning the slave server back on again, the leak stopped, but it did not recover the lost memory. It seems the only way to recover is to restart replication-manager.

preffect avatar May 25 '18 20:05 preffect