zookeeper icon indicating copy to clipboard operation
zookeeper copied to clipboard

ZOOKEEPER-3318:[CLI way]Add a complete backup mechanism for zookeeper internal

Open maoling opened this issue 5 years ago • 4 comments

  • takeSnapshot api is just like sync, only supports async call.

for no-realtime backup. write a new tool/shell: zkBackup.sh which is the reverse proces of the zkCleanup.sh for no-realtime backup.

  • this way is not mainstream and not elegant, just using the transferTo copying the snapshots in the dataDir.so I delete the related codes.
  • snapshot on a path with no permission.

[zk: 127.0.0.1:2180(CONNECTED) 3] snapshot /data/forbidden_dir Snapshot has failed. rc=-124 2019-05-12 19:02:49,839 [myid:] - WARN [SyncThread:0:FinalRequestProcessor@462] - Unexpected exception when taking the snapshot in the directory:/data/forbidden_dir java.io.FileNotFoundException: /data/forbidden_dir/version-2/snapshot.fa0100018955 (Permission denied) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:162) at org.apache.zookeeper.server.persistence.SnapStream.getOutputStream(SnapStream.java:129) at org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:214) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:435) at org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:415) at org.apache.zookeeper.server.ZooKeeperServer.takeSnapshotExternal(ZooKeeperServer.java:386) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:460) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:154

maoling avatar Apr 22 '19 11:04 maoling

Directed snapshots seem useful to me; I have reservations about snapshot taking being triggered by the ZooKeeper client. If it's just a way to freeze the FinalRequestProcessor and take a synchronous snapshot then is there a way to achieve that while using the admin server or jmx as the entry point?

enixon avatar May 13 '19 17:05 enixon

Let me take a close look at the PR-180,After that,I will answer all the concerns together.

maoling avatar Jun 15 '19 14:06 maoling

If it's just a way to freeze the FinalRequestProcessor and take a synchronous snapshot ?

@enixon

  • A good insight. we can do the following in the FinalRequestProcessor.
            case OpCode.takeSnapshot: {
                 ------------------------------------------------------
//                new ZooKeeperThread("Client Snapshot Thread") {
//                    public void run() {
//                      try {
//                        zks.takeSnapshotExternal(dir);
//                      } catch (IOException e) {
//                      }
//                    }
//                }.start();
  • the really headache is the security issue to implement it with CLI:

Since this PR is targeting master I suggest considering the option of adding a snap API to ZooKeeperAdmin, which is recently introduced to harden security around dynamic reconfiguration. ZooKeeperAdmin supports all sorts of authentications built in ZK and we can extend it such that only admin (or any users that explicitly being granted admin access to cluster) can issue snap command.

@hanm ZooKeeperAdmin currently may not support the authentications issue.I see the authentications of the CLI:reconfig is dependent on the write permission on the node /zookeeper/config

maoling avatar Aug 08 '19 08:08 maoling

@maoling @enixon I have similar thoughts. It's a great idea of triggering backup using admin server as the entry point. However, taking a snapshot from in-memory data tree without backing up transaction logs seems not sufficient, as the snapshot is fuzzy and may not represent the state of the data tree at any point of time. We may have data loss or data integrity issue in the quorum lost case if the transaction logs are not backed up.

I wonder if there are any thoughts or discussions on providing a complete backup solution without the risk of losing data?

Directed snapshots seem useful to me; I have reservations about snapshot taking being triggered by the ZooKeeper client. If it's just a way to freeze the FinalRequestProcessor and take a synchronous snapshot then is there a way to achieve that while using the admin server or jmx as the entry point?

li4wang avatar May 12 '22 00:05 li4wang