tikv
tikv copied to clipboard
Make FlashbackToVersion become a two-phase request
Ref https://github.com/tikv/tikv/issues/13303.
Background
The current flashback process is described below:
- Client determines which key range to perform the flashback request, e.g, TiDB uses the table key prefix
t_as the key range to flashback the whole cluster. - Client sends multiple requests to different regions on different stores with the latest
start_tsandcommit_ts. - Each region handles its own flashback progress independently.
- Lock the Raft proposing and lease read to block all requests including reading, writing, and scheduling.
- Read the old MVCC data and write it again with the given
start_tsandcommit_tsto pretend it's a new transaction commit. - Release the Raft proposing lock and resume the lease read.
- Client checks whether all the requests returned successfully, and retries those that failed with new
start_tsandcommit_tsuntil the whole flashback is done.
Since the current implementation of the flashback is not atomic, there might be some transaction problems during the process. For example, if another client tries to read the data during the flashback, it might see an incomplete transaction because some regions may have completed the flashback, while others have not.
Another problem is that since regions that have not yet performed flashback will have the opportunity to have new data written by other clients, we introduce a retry mechanism, which makes different regions inside one flashback progress may use different start_ts and commit_ts to write data that overwrite the MVCC version. This makes the flashback not atomic.
Solution
The solution is to make FlashbackToVersion become a two-phase request. The new flashback process should look like this:
- Client determines which key range to perform the flashback request, e.g, TiDB uses the table key prefix
t_as the key range to flashback the whole cluster. - Client sends multiple requests to different regions on different stores to lock the Raft proposing and lease read to block all requests including reading, writing, and scheduling.
- Client checks whether all the requests returned successfully, and retries those that failed until the whole lock phase is done.
- Client sends multiple requests to different regions with the latest
start_tsandcommit_ts. - Each region handles its own flashback progress independently.
- Read the old MVCC data and write it again with the given
start_tsandcommit_tsto pretend it's a new transaction commit. - Release the Raft proposing lock and resume the lease read.
- Read the old MVCC data and write it again with the given
- Client checks whether all the requests returned successfully, and retries those that failed with same
start_tsandcommit_tsuntil the whole flashback is done.
Since we will make sure all regions won't have any newer writing by locking it in the first phase, all requests in the same flashback will write the overwriting MVCC data with the same start_ts and commit_ts. The lock phase also guarantees that there is no other transaction that could commit successfully during and after the flashback except the optimistic inserting transaction (see https://github.com/pingcap/tidb/issues/37961).
Task
- [ ] Add a Raft admin command to put the region into a lock state to prevent any reading, writing, and scheduling and persist the state in the
RegionLocalState. - [ ] Make
kv_flashback_to_versionbecome a two-phase request as described above.
One more thing, if the region lock can support specifiy key range, it will be more convenient to implement other functions later, such as flashback table and so on.