tikv Make FlashbackToVersion become a two-phase request

Make FlashbackToVersion become a two-phase request

Open JmPotato opened this issue 3 years ago • 1 comments

Ref https://github.com/tikv/tikv/issues/13303.

Background

The current flashback process is described below:

Client determines which key range to perform the flashback request, e.g, TiDB uses the table key prefix t_ as the key range to flashback the whole cluster.
Client sends multiple requests to different regions on different stores with the latest start_ts and commit_ts.
Each region handles its own flashback progress independently.
1. Lock the Raft proposing and lease read to block all requests including reading, writing, and scheduling.
2. Read the old MVCC data and write it again with the given start_ts and commit_ts to pretend it's a new transaction commit.
3. Release the Raft proposing lock and resume the lease read.
Client checks whether all the requests returned successfully, and retries those that failed with new start_ts and commit_ts until the whole flashback is done.

Since the current implementation of the flashback is not atomic, there might be some transaction problems during the process. For example, if another client tries to read the data during the flashback, it might see an incomplete transaction because some regions may have completed the flashback, while others have not.

Another problem is that since regions that have not yet performed flashback will have the opportunity to have new data written by other clients, we introduce a retry mechanism, which makes different regions inside one flashback progress may use different start_ts and commit_ts to write data that overwrite the MVCC version. This makes the flashback not atomic.

Solution

The solution is to make FlashbackToVersion become a two-phase request. The new flashback process should look like this:

Client determines which key range to perform the flashback request, e.g, TiDB uses the table key prefix t_ as the key range to flashback the whole cluster.
Client sends multiple requests to different regions on different stores to lock the Raft proposing and lease read to block all requests including reading, writing, and scheduling.
Client checks whether all the requests returned successfully, and retries those that failed until the whole lock phase is done.
Client sends multiple requests to different regions with the latest start_ts and commit_ts.
Each region handles its own flashback progress independently.
1. Read the old MVCC data and write it again with the given start_ts and commit_ts to pretend it's a new transaction commit.
2. Release the Raft proposing lock and resume the lease read.
Client checks whether all the requests returned successfully, and retries those that failed with same start_ts and commit_ts until the whole flashback is done.

Since we will make sure all regions won't have any newer writing by locking it in the first phase, all requests in the same flashback will write the overwriting MVCC data with the same start_ts and commit_ts. The lock phase also guarantees that there is no other transaction that could commit successfully during and after the flashback except the optimistic inserting transaction (see https://github.com/pingcap/tidb/issues/37961).

Task

[ ] Add a Raft admin command to put the region into a lock state to prevent any reading, writing, and scheduling and persist the state in the RegionLocalState.
[ ] Make kv_flashback_to_version become a two-phase request as described above.

Sep 22 '22 05:09 JmPotato

One more thing, if the region lock can support specifiy key range, it will be more convenient to implement other functions later, such as flashback table and so on.

Sep 22 '22 05:09 Defined2014

tikv tikv copied to clipboard

Make FlashbackToVersion become a two-phase request

Background

Solution

Task

tikv
tikv copied to clipboard