zookeeper icon indicating copy to clipboard operation
zookeeper copied to clipboard

ZOOKEEPER-3964: Introduce RocksDB snap and implement change data capture to enable incremental snapshot

Open hanm opened this issue 4 years ago • 9 comments

This is the first step of enabling on disk storage engine for ZooKeeper by extending the existing Snap interface and implement a RocksDB backed snapshot. Comparing to file based snapshot, RocksDB based snapshot is superior for big in memory data tree as it supports incremental snapshot by only serializing the changed data between snapshots.

High level overview:

  • Extend Snap interface so every thing that's need serialize has a presence on the interface.
  • Implement RocksDB based snapshot, and bidirectional conversations between File based snapshot and RocksDB snapshot, for back / forward compatibility.
  • Change data capture is implemented by buffering transactions applied to data tree, and applied to RocksDB when processing each transaction. An incremental snapshot thus only requires RocksDB flush. ZK will always do a full snapshot when first loading the data tree during the start process.
  • By default, this feature is disabled. Users need opt in by explicitly specify a Java system property to instantiate RocksDBSnap at runtime.

This work is based on top of the patch attached to ZOOKEEPER-3783 (kudos to Fangmin and co at FB), with some bug / test fixes and adjustment so it can cleanly apply to master branch.

hanm avatar Oct 07 '20 23:10 hanm

Great work. I left a few code style comments, please take a look.

Also can you please follow up with a documentation update?

Thanks for review, @eolivelli . Patch updated.

For documentation, I will do it in ZOOKEEPER-3965.

hanm avatar Nov 15 '20 03:11 hanm

I believe that this feature must be of uppermost priority of current community work. I'll spend most of my time to review, test and benchmark this feature.

maoling avatar Apr 04 '21 09:04 maoling

Great stuff @hanm ! I'm also planning to review this. What are the large number of THREAD_SAFETY violations from muse-dev bot?

anmolnar avatar Apr 04 '21 15:04 anmolnar

What are the large number of THREAD_SAFETY violations from muse-dev bot?

These are reports generated by the static analysis tool used by the bot. Some of these are false negatives, I'll have a pass to address these warnings and rebase this patch so it's ready to be reviewed.

hanm avatar Apr 21 '21 02:04 hanm

  • This branch has conflicts that must be resolved ???
  • reviewing without a single halt :)

maoling avatar Apr 22 '21 14:04 maoling

Rebase patch with latest master to resolve merge conflicts. I will have another pass to see what to do with the musedev bot.

hanm avatar May 21 '21 05:05 hanm

@hanm I'm one of the developers of Muse bot and am watching this play out. If, at the end of the day, you have thoughts or opinions then we're all ears. These particular reports are produced by a tool named Infer, maintained by Facebook. It is usually quite accurate about thread safety issues, though there is the occasional codebase that just breaks its brain.

TomMD avatar May 21 '21 05:05 TomMD

@hanm I'm one of the developers of Muse bot and am watching this play out. If, at the end of the day, you have thoughts or opinions then we're all ears. These particular reports are produced by a tool named Infer, maintained by Facebook. It is usually quite accurate about thread safety issues, though there is the occasional codebase that just breaks its brain.

Thanks for the notes, I do believe it's a good thing to introduce a static analysis tool for the ZooKeeper code base here as well. I will report any false reports later if I find any.

hanm avatar May 24 '21 16:05 hanm

Just had a pass to address muse-dev reports.

hanm avatar May 28 '21 22:05 hanm