sig-transaction icon indicating copy to clipboard operation
sig-transaction copied to clipboard

Find a solution for schema version check problem for 1PC

Open MyonKeminta opened this issue 4 years ago • 2 comments

Previously we found that schema version checking might be a problem. Later we think it can be solved if we define a transaction to be committed iff all its keys are prewritten, and, the schema version doesn't change between its start_ts and commit_ts. However it still solve the problem of 1PC, which have no chance to check the schema version at all. We need to find a solution for this if we want to implement 1PC.


From @sticnarf : Maybe we can have something like max_commit_ts. It's also something like lease. We send it in prewrite. If the calculated min_commit_ts > max_commit_ts , the prewrite will fail (can fallback). When doing DDL, we invalidate max_commit_ts to disable async commit (or 1PC), but ensure changes before max_commit_ts are valid with the previous schema.

MyonKeminta avatar Sep 02 '20 05:09 MyonKeminta

Update:

With https://github.com/pingcap/tidb/pull/20550, we rely on checks on max_commit_ts to guarantee async commit and 1PC transactions conforms the old schema.

However, there are still some subtle and not so serious cases we haven't solved yet:

  • [ ] MODIFY COLUMN changes default value

    • A transaction prewrites with the old default value while the DDL happens during the transaction. Then we may first get the done, then commit with the old default.
    • This is not a serious problem because the user should not have a certain expectation about the default value while there is a concurrent change default DDL.
    • An improvement is to implement amendment for "change default". So only a concurrent DDL happens during prewrite will cause such a problem. And such a problem is only discoverable if we compare the transaction commit TS with the DDL commit TS. It shouldn't be a real problem for users.
  • [ ] MODIFY COLUMN from allow null to not null

    • A transaction triggers prewrite before non-null flags are set. And the prewrite request arrives later than DDL's SELECT check. Then, we may bypass DDL's check and finally commits a null value.
    • This is also not very serious (no data-index inconsistency). But it's indeed an unexpected behavior to user.
    • Possible solution: like DDLs that need reorganization, we can delay 2 seconds after the non-null flag is set and the recheck. And we also check whether a null value will be written before prewrite. Then we can avoid writing a null value with the protection of max_commit_ts.

cc @MyonKeminta @coocood @cfzjywxk

sticnarf avatar Oct 28 '20 09:10 sticnarf

admin repair table can corrupt data. But we may not need to solve it since it's rare and users should be aware of its consequence.

ekexium avatar Nov 05 '20 04:11 ekexium