vertx-zookeeper icon indicating copy to clipboard operation
vertx-zookeeper copied to clipboard

Unexpected client session timed out

Open neterium opened this issue 5 years ago • 3 comments

I'm using version 3.6.3, with zookeeper as cluster manager. I've started putting some pressure on our servers by increasing the workload. In terms of response time, everything is OK, the eventbus seems to absorb the trafic, no "Thread Blocked" event, etc... (The number of verticles is #cores - 1) However, after a while it looks like the connection the ZK is lost:

2019-06-12 10:15:49.981  WARN 1 --- [.internal:2181)] org.apache.zookeeper.ClientCnxn          : Client session timed out, have not heard from server in 34816ms for sessionid 0x100000048e50034
2019-06-12 10:15:49.981  WARN 1 --- [.internal:2181)] org.apache.zookeeper.ClientCnxn          : Client session timed out, have not heard from server in 34839ms for sessionid 0x100000048e50035
2019-06-12 10:15:49.981  WARN 1 --- [.internal:2181)] org.apache.zookeeper.ClientCnxn          : Client session timed out, have not heard from server in 34789ms for sessionid 0x100000048e50033
2019-06-12 10:15:49.981  WARN 1 --- [.internal:2181)] org.apache.zookeeper.ClientCnxn          : Client session timed out, have not heard from server in 29528ms for sessionid 0x100000048e5003b
2019-06-12 10:15:50.082  WARN 1 --- [tor-TreeCache-0] i.v.s.c.zookeeper.impl.ZKAsyncMultiMap   : connection to the zookeeper server have suspended.
2019-06-12 10:15:50.082 ERROR 1 --- [worker-thread-5] i.v.s.c.z.ZookeeperClusterManager        : java.lang.IllegalStateException: Not acquired
2019-06-12 10:15:51.954  WARN 1 --- [.internal:2181)] org.apache.zookeeper.ClientCnxn          : Unable to reconnect to ZooKeeper service, session 0x100000048e5003b has expired
2019-06-12 10:15:51.955 ERROR 1 --- [orker-thread-15] i.v.s.c.z.ZookeeperClusterManager        : java.lang.IllegalStateException: Not acquired
2019-06-12 10:15:51.955  WARN 1 --- [d-0-EventThread] org.apache.curator.ConnectionState       : Session expired event received
2019-06-12 10:15:51.957 ERROR 1 --- [tor-TreeCache-0] i.v.s.c.zookeeper.impl.ZKAsyncMultiMap   : connection to the zookeeper server have lost, all the temporary node will be remove.
2019-06-12 10:15:51.993  INFO 1 --- [ntloop-thread-0] i.v.s.c.zookeeper.impl.ZKAsyncMultiMap   : restore eventbus snapshot cache success.

How can I prevent this from happening ?

Thanks

neterium avatar Jun 12 '19 10:06 neterium

Looks like it comes from a "stop the world" major GC at this time. I can finetune the GC settings, but is there a way to increase the session timeout?

neterium avatar Jun 12 '19 13:06 neterium

Hi

You can set session timeout in zk server by change parameter tick.

tickTime the length of a single tick, which is the basic time unit used by ZooKeeper, as measured in milliseconds. It is used to regulate heartbeats, and timeouts. For example, the minimum session timeout will be two ticks.


Stream Liu [email protected]

On Jun 12, 2019, at 21:40, neterium [email protected] wrote:

Looks like it comes from a "stop the world" major GC at this time. I can finetune the GC settings, but is there a way to increase the session timeout?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vert-x3/vertx-zookeeper/issues/82?email_source=notifications&email_token=AACCWE2THTX5UTQSXSRKEC3P2D4DPA5CNFSM4HXHW232YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXQOJUI#issuecomment-501277905, or mute the thread https://github.com/notifications/unsubscribe-auth/AACCWEY7VA3ATNBHIIU6BWDP2D4DPANCNFSM4HXHW23Q.

stream-iori avatar Jun 12 '19 14:06 stream-iori

Looks like it comes from a "stop the world" major GC at this time. I can finetune the GC settings, but is there a way to increase the session timeout?

@neterium I'm confused about the zk disconnect problem in my own project. why do you think it comes from major GC? can a "stop the world" major GC last for 29528ms ?

appreciate for your comply:)

Viking18 avatar Jan 19 '20 07:01 Viking18