sig-transaction Find a solution for schema version check problem for 1PC

Previously we found that schema version checking might be a problem. Later we think it can be solved if we define a transaction to be committed iff all its keys are prewritten, and, the schema version doesn't change between its start_ts and commit_ts. However it still solve the problem of 1PC, which have no chance to check the schema version at all. We need to find a solution for this if we want to implement 1PC.

From @sticnarf : Maybe we can have something like max_commit_ts. It's also something like lease. We send it in prewrite. If the calculated min_commit_ts > max_commit_ts , the prewrite will fail (can fallback). When doing DDL, we invalidate max_commit_ts to disable async commit (or 1PC), but ensure changes before max_commit_ts are valid with the previous schema.

Sep 02 '20 05:09 MyonKeminta

Update:

With https://github.com/pingcap/tidb/pull/20550, we rely on checks on max_commit_ts to guarantee async commit and 1PC transactions conforms the old schema.

However, there are still some subtle and not so serious cases we haven't solved yet:

[ ] MODIFY COLUMN changes default value
- A transaction prewrites with the old default value while the DDL happens during the transaction. Then we may first get the done, then commit with the old default.
- This is not a serious problem because the user should not have a certain expectation about the default value while there is a concurrent change default DDL.
- An improvement is to implement amendment for "change default". So only a concurrent DDL happens during prewrite will cause such a problem. And such a problem is only discoverable if we compare the transaction commit TS with the DDL commit TS. It shouldn't be a real problem for users.
[ ] MODIFY COLUMN from allow null to not null
- A transaction triggers prewrite before non-null flags are set. And the prewrite request arrives later than DDL's SELECT check. Then, we may bypass DDL's check and finally commits a null value.
- This is also not very serious (no data-index inconsistency). But it's indeed an unexpected behavior to user.
- Possible solution: like DDLs that need reorganization, we can delay 2 seconds after the non-null flag is set and the recheck. And we also check whether a null value will be written before prewrite. Then we can avoid writing a null value with the protection of max_commit_ts.

cc @MyonKeminta @coocood @cfzjywxk

Oct 28 '20 09:10 sticnarf

admin repair table can corrupt data. But we may not need to solve it since it's rare and users should be aware of its consequence.

Nov 05 '20 04:11 ekexium

sig-transaction sig-transaction copied to clipboard

Find a solution for schema version check problem for 1PC

sig-transaction
sig-transaction copied to clipboard