zookeeper
zookeeper copied to clipboard
ZOOKEEPER-3425: ranking the ttl by expireTime asc for the performance
-
Ranking the ttl by expireTime asc will get the following two benefits(Overall using space to swap time):
- the guarantee of timing order, t1 will disappear before the t2 when t1's expired timestamp < t2's in the same round.
- avoid the inefficient search in the
getCandidates()
. If we have 1000+(even the 10000+) ttl nodes which will disappear two hours later, every iteration it will check whether all the ttls node has expired. With ranking, it will break/jump quickly. - this design was inspired by the hadoop's client LeaseManager:
-
A UT:
testTTLTreeSetCollection
was added to assert the behaviour of the TreeSet Collection:ttls
. -
BTW, let's revisit the TimeTask's
scheduleAtFixedRate
strategy. the TimeTask will run every minute(with the defalutcheckIntervalMs
value). when the time current round spends is more than one minute, it will waiting for the current round being finished, then starting another round. That's OK. -
more details in the ZOOKEEPER-3425
@maoling I like the improvement part: sorting TTLs by expiry time makes sense to me, but why do you want to disable children creation?
@Randgalt You might want to take a look at this patch.
TTL nodes must be able to have children. I'm -1 on this part of the PR - it's a non-starter TBH and completely changes what a TTL node is and how it's used.
Heavy -1 on this. This would break users such as Elastic search cloud. TTL nodes were designed to have children.
@anmolnar @Randgalt Yep, we should not break the compatibility, I have deleted the changes about the semantics of TTL and focus on the performance issue by a force update. PTAL.
Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1648/
Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1649/
Here's an alternate implementation that's simpler (in my view). However, I need to think more about the maintenance of the two maps. I'm not certain this is thread safe. Finally, is this optimization needed? This smells like a premature optimization to me. Has anyone seen issues with this in production installations? The TTLs are searched in a background thread and shouldn't affect foreground performance very much.
https://github.com/apache/zookeeper/compare/master...Randgalt:ZOOKEEPER-3425-order-ttls-for-performance
update: I've verified the thread safety of the two maps and updated the code to make sure it's properly synchronized.
This new TTLManager solves thread safety however it hurts concurrency. In the master implementation, the TTL map is concurrent (via the ConcurrentMap) and allows for multi-thread mutation. This class synchronizes the entire TTL map making access serial for all the threads.
I'm, in general, still highly -1 on this change as it feels like a premature optimization.
Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1651/
@Randgalt
- Great review work and insight. I take a close look at the patch you provide, learn a lot from it.
- I don't insist on this improvement, not sure whether it's a premature optimization. But I believe this PR has opened an interesting discussion :). I really like this atmosphere.
If this patch is purely performance optimisation (I absolutely support that a single patch should focus on a single thing at a time), I believe we should see some test cases and numbers to prove the benefits.
I second @Randgalt 's concern: I don't like that we have to synchronise on entire TTLManager
. Hence I'd like to see some perf tests on different workloads: I suspect this implementation performs worse with clients that create / delete lots of TTLs.
On the other hand - and I'm not sure about this -, but if writes in ZooKeeper are serialised, do we need to care about synchronisation? Which threads are possibly accessing TTLManager
at the same time?
if writes in ZooKeeper are serialised, do we need to care about synchronisation?
I've wondered about that myself. The Watcher Manager has a bunch of synchronization (and other parts of the code too). TBH - I don't know enough about it and have always followed other synchronized blocks in the ZK code.