curator icon indicating copy to clipboard operation
curator copied to clipboard

[CURATOR-437] zookeeper connection leak when session expires.

Open jira-importer opened this issue 8 years ago • 2 comments

https://github.com/apache/curator/blob/master/curator-client/src/main/java/org/apache/curator/utils/InjectSessionExpiration.java#L97

Curator inject will set zookeeper state to CLOSED when session expires without close zk associated threads.

If state set to CLOSED, ZooKeeper.close() function won't be able to release resources properly, which lead to memory and connection leak.

To reproduce, create a curator client, then shutdown zk server, wait for session timeout, restart the zk server. There will be two ZooKeeper instances and two connections to the server.


Originally reported by zealot0630, imported from: zookeeper connection leak when session expires.
  • status: Open
  • priority: Major
  • resolution: Unresolved
  • imported: 2025-01-21

jira-importer avatar Oct 19 '17 12:10 jira-importer

randgalt:

Setting the ZK state to CLOSED is not an issue because the ClientCnxn is set to exit (via event of death). I started a ZK instance, had some code call injectSessionExpiration(). The number of connections reported by 'srvr' stays consistent - i.e. I don't see any extra connections. Can you provide a test that shows the issue?

jira-importer avatar Oct 19 '17 13:10 jira-importer

zealot0630:

Scala script to reproduse


System.setProperty("org.slf4j.simpleLogger.defaultLogLevel", "debug")
System.setProperty("org.slf4j.simpleLogger.showDateTime", "true")
System.setProperty("org.slf4j.simpleLogger.dateTimeFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSZ")
:require /usr/share/java/slf4j-api.jar
:require /usr/share/java/slf4j-simple-1.7.22.jar
:require zookeeper-3.5.3-beta.jar
:require curator-framework-4.0.0.jar
:require curator-client-4.0.0.jar
:require guava-18.0.jar
val cf = org.apache.curator.framework.CuratorFrameworkFactory.newClient("10.185.0.93:2181,10.185.0.94:2181,10.185.0.95:2181", new org.apache.curator.retry.RetryForever(3000))
cf.start()



I also attached full debug log. From the log, I found two sessions:

2017-10-20T16:37:58.697+0800 [main-SendThread(10.185.0.94:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x201bb467c360004 after 0ms
2017-10-20T16:37:59.842+0800 [main-SendThread(10.185.0.94:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x201bb48199d0003 after 0ms

And also netstat shows two tcp connections:

ESTAB      0      0       ::ffff:10.185.0.81:39132       ::ffff:10.185.0.94:2181users:(("java",pid=593,fd=57))
ESTAB      0      0       ::ffff:10.185.0.81:39118       ::ffff:10.185.0.94:2181users:(("java",pid=593,fd=60))

It isn't 100% reproducible, I tried three times, and successfully reproduced twice.

I can also upload heap dump, but it is around 100MB, maybe too big

jira-importer avatar Oct 20 '17 08:10 jira-importer