[CURATOR-437] zookeeper connection leak when session expires.
Curator inject will set zookeeper state to CLOSED when session expires without close zk associated threads.
If state set to CLOSED, ZooKeeper.close() function won't be able to release resources properly, which lead to memory and connection leak.
To reproduce, create a curator client, then shutdown zk server, wait for session timeout, restart the zk server. There will be two ZooKeeper instances and two connections to the server.
Originally reported by zealot0630, imported from: zookeeper connection leak when session expires.
- status: Open
- priority: Major
- resolution: Unresolved
- imported: 2025-01-21
Setting the ZK state to CLOSED is not an issue because the ClientCnxn is set to exit (via event of death). I started a ZK instance, had some code call injectSessionExpiration(). The number of connections reported by 'srvr' stays consistent - i.e. I don't see any extra connections. Can you provide a test that shows the issue?
Scala script to reproduse
System.setProperty("org.slf4j.simpleLogger.defaultLogLevel", "debug") System.setProperty("org.slf4j.simpleLogger.showDateTime", "true") System.setProperty("org.slf4j.simpleLogger.dateTimeFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSZ") :require /usr/share/java/slf4j-api.jar :require /usr/share/java/slf4j-simple-1.7.22.jar :require zookeeper-3.5.3-beta.jar :require curator-framework-4.0.0.jar :require curator-client-4.0.0.jar :require guava-18.0.jar val cf = org.apache.curator.framework.CuratorFrameworkFactory.newClient("10.185.0.93:2181,10.185.0.94:2181,10.185.0.95:2181", new org.apache.curator.retry.RetryForever(3000)) cf.start()
I also attached full debug log. From the log, I found two sessions:
2017-10-20T16:37:58.697+0800 [main-SendThread(10.185.0.94:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x201bb467c360004 after 0ms 2017-10-20T16:37:59.842+0800 [main-SendThread(10.185.0.94:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x201bb48199d0003 after 0ms
And also netstat shows two tcp connections:
ESTAB 0 0 ::ffff:10.185.0.81:39132 ::ffff:10.185.0.94:2181users:(("java",pid=593,fd=57)) ESTAB 0 0 ::ffff:10.185.0.81:39118 ::ffff:10.185.0.94:2181users:(("java",pid=593,fd=60))
It isn't 100% reproducible, I tried three times, and successfully reproduced twice.
I can also upload heap dump, but it is around 100MB, maybe too big