sig-transaction
sig-transaction copied to clipboard
Find a solution for schema version check problem for 1PC
Previously we found that schema version checking might be a problem. Later we think it can be solved if we define a transaction to be committed iff all its keys are prewritten, and, the schema version doesn't change between its start_ts and commit_ts. However it still solve the problem of 1PC, which have no chance to check the schema version at all. We need to find a solution for this if we want to implement 1PC.
From @sticnarf : Maybe we can have something like max_commit_ts
. It's also something like lease.
We send it in prewrite. If the calculated min_commit_ts > max_commit_ts
, the prewrite will fail (can fallback).
When doing DDL, we invalidate max_commit_ts
to disable async commit (or 1PC), but ensure changes before max_commit_ts
are valid with the previous schema.
Update:
With https://github.com/pingcap/tidb/pull/20550, we rely on checks on max_commit_ts
to guarantee async commit and 1PC transactions conforms the old schema.
However, there are still some subtle and not so serious cases we haven't solved yet:
-
[ ]
MODIFY COLUMN
changes default value- A transaction prewrites with the old default value while the DDL happens during the transaction. Then we may first get the done, then commit with the old default.
- This is not a serious problem because the user should not have a certain expectation about the default value while there is a concurrent change default DDL.
- An improvement is to implement amendment for "change default". So only a concurrent DDL happens during prewrite will cause such a problem. And such a problem is only discoverable if we compare the transaction commit TS with the DDL commit TS. It shouldn't be a real problem for users.
-
[ ]
MODIFY COLUMN
from allow null to not null- A transaction triggers prewrite before non-null flags are set. And the prewrite request arrives later than DDL's SELECT check. Then, we may bypass DDL's check and finally commits a null value.
- This is also not very serious (no data-index inconsistency). But it's indeed an unexpected behavior to user.
- Possible solution: like DDLs that need reorganization, we can delay 2 seconds after the non-null flag is set and the recheck. And we also check whether a null value will be written before prewrite. Then we can avoid writing a null value with the protection of
max_commit_ts
.
cc @MyonKeminta @coocood @cfzjywxk
admin repair table
can corrupt data. But we may not need to solve it since it's rare and users should be aware of its consequence.