ZOOKEEPER-4858: Remove the lock contention between snapshotting and the sync operation
@eolivelli can I get a review for this? Thanks
Thanks for reviewing it, @anmolnar.
Yes, we would still want exclusive locking between snapshot and restore to make sure data tree is not replaced by restoring operation while snapshotting is in progress.
added a specific lock just for snapshot and restore operations. @anmolnar would you mind taking a look at it. Thanks.
@anmolnar I've addressed your comment. Can you take a quick look at it? Thanks.
Are you sure that we don't need to synchronize between snapshotting and sync()?
@kezhuw Thoughts?
We ran into perf issue when had sync on the ZookeeperServer object. Whenever snapshot was taken, the sync operations were queued up in the outstanding request queue and the sync() operation latency was largely increased. (edited) There was no synchronize between snapshotting and sync() before. The intention of adding the synchronized was to takeSnapshot() and restore() was to sync between the two operations. I didn’t realize the sync() call also requires ZookeeperServer object lock.
When has synchronized been added to take/restore snapshot?
Are you saying that sync() was already synchronized which should prevent multiple sync's running at the same time, but accidentally you introduced snapshotting into this lock?
When has synchronized been added to take/restore snapshot?
synchronized on the ZookeeperServer object was added when we introduced remote backup and restore feature, specifically https://github.com/apache/zookeeper/pull/1943
Are you saying that sync() was already synchronized which should prevent multiple sync's running at the same time, but accidentally you introduced snapshotting into this lock?
Yes. I didn't know that sync() is synchronized on FollowerZookeeperServer, which means it's also synchronized onZookeeperServer object. That's why we had lock contention between takingSnapshot() and sync().
LGTM
it causes lock contention on the ZookeeperServer object with the sync operation.
Does anyone have any cue about why
syncgetsynchronized? I saw thatsyncis called only in one thread andpendingSyncsis concurrent safe.
I have no idea either. I think we could remove synchronize from sync() too. Or remove it only from sync().
ZOOKEEPER-136 added synchronized to sync()
Thanks @anmolnar and @kezhuw
I saw that sync is called only in one thread and pendingSyncs is concurrent safe. I think we could remove synchronize from sync() too. Or remove it only from sync().
sync() is synchronized on the ZookeeperServer object, which is accessed by multiple-threads. Through inheritance and composition, the sync() operation is currently synchronized with following API calls.
ZookeeperServer.enqueueRequest(),
ZookeeperServer.submitRequestNow()
ZookeeperServer.startup()
ZookeeperServer.shutdown()
Learner.syncWithLeader()
sync() is used for strong consistency of not getting stale data in the read after write scenario. Not sure if it's safe to remove synchronization from it.
@anmolnar @kezhuw The PR is good for merge now, right? Thanks.