tidb icon indicating copy to clipboard operation
tidb copied to clipboard

doc: lazy constraint check in pessimistic txn

Open sticnarf opened this issue 3 years ago • 15 comments

What problem does this PR solve?

Issue Number: ref #36579

Check List

Tests

  • [ ] Unit test
  • [ ] Integration test
  • [ ] Manual test (add detailed scripts or steps below)
  • [x] No code

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

sticnarf avatar Aug 04 '22 06:08 sticnarf

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • cfzjywxk
  • ekexium

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment. After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review. Reviewer can cancel approval by submitting a request changes review.

ti-chi-bot avatar Aug 04 '22 06:08 ti-chi-bot

cc @ekexium @TonsnakeLin @cfzjywxk

sticnarf avatar Aug 04 '22 06:08 sticnarf

Code Coverage Details: https://codecov.io/github/pingcap/tidb/commit/a197163ea7165467fe83c34dd1a0f31139b5db12

sre-bot avatar Aug 04 '22 07:08 sre-bot

Shall we add a section to demonstrate the safety of the change since the constraint check mechanism is subtle and vulnerable?

ekexium avatar Aug 04 '22 07:08 ekexium

~~I am somewhat convinced of the safety of write: writes won't break the integrity of data in the storage. While the safety of read, namely read statements won't return results that break the constraint, may require further consideration.~~

A case similar to that in #24195, if we enable the feature:

create table t2 (id int primary key, uk int, unique key i1(uk))
insert into t2 values (1, 1)
set @@tidb_txn_assertion_level=off

begin pessimistic
insert into t2 values (2, 1)
select * from t2 use index(primary) for update -- same result whether for update or not
delete from t2 where id = 1
commit

admin check table t2

The select result is 2 rows (1, 1) and (2, 1), which breaks the uniqueness constraint. While the final commit can succeed. And the admin check will report data-index inconsistency. BTW, assertion could prevent the txn from committing.

"Acquisition of lock always resolves constraint check" doesn't help. The feature sort of enlarges the anomaly

ekexium avatar Aug 09 '22 10:08 ekexium

For reasoning its safety, I'm considering what if we allow an arbitrary mix of optimistic and pessimistic mutations in one transaction? Will it break any safety property?

ekexium avatar Aug 10 '22 05:08 ekexium

For reasoning its safety, I'm considering what if we allow an arbitrary mix of optimistic and pessimistic mutations in one transaction? Will it break any safety property?

Pessimistic mutations are inheritantly no different from optimistic mutations after they are turned into 2pc locks in prewrite, so I think it is very unlikely that this will break any properties.

Instead, I worry more about the handling about union scan and generating the mutations. It's more complicated when the result set sometimes break constraints but the transaction can commit in the end.

sticnarf avatar Aug 10 '22 06:08 sticnarf

the handling about union scan and generating the mutations

There're several cases we may need to design and handle carefully, for example:

  • delete-your-write from which we've encountered data inconsistency issues before. The inserted content may not need to be committed but constraint checks are still required.
  • Report duplicate error in time and avoid further weird behaviors considering the unionScan executor.
  • The transactional key flags and statement level key flags processing which are easy to make mistakes.

cfzjywxk avatar Aug 10 '22 09:08 cfzjywxk

When a key that has its constraint check deferred in a previous statement gets modified, we have 2 options:

  1. Lock it as we did before, resolve the constraint check and unset the defer flag Then we need some kind of a "once flag" mechanism, considering the possibility that a statement may involve multiple operations on a key. Any modifications to the key (even in the same statement) will unset the flag.

  2. Skip the lock. The reason is the same as the one that let us skip the constraint check: if the user believes the key is less likely to have conflicts, why don't we skip the lock as well? It doesn't affect the success rate after all. It requires PresumeNotExist flags on more than PUT mutations. An extra Op like DeleteMustExist needs to be added, at least.

ekexium avatar Aug 10 '22 09:08 ekexium

2. Skip the lock. The reason is the same as the one that let us skip the constraint check: if the user believes the key is less likely to have conflicts, why don't we skip the lock as well? It doesn't affect the success rate after all. It requires PresumeNotExist flags on more than PUT mutations. An extra Op like DeleteMustExist needs to be added, at least.

What I'm not sure about the second option is the risk of how we handle a result set with broken constraints. And you have found an example above that there can be inconsistency unless we add new special ops. There could be other possible issues that we haven't found.

sticnarf avatar Aug 10 '22 09:08 sticnarf

There could be other possible issues that we haven't found.

Yes, I'm pursuing proof of the safety of this change, whichever options we take.

ekexium avatar Aug 10 '22 09:08 ekexium

select * from t2 use index(primary) for update -- same result whether for upda

@ekexium Does it happen because the select for update statement does not acquire the pessimistic lock on the unique key but just the row key? Actually, the unique key should also be locked though it's not changed as described in the issue. It seems that if the row key and unique key are always locked Acquisition of lock always resolves constraint check should be able to work.

cfzjywxk avatar Aug 11 '22 06:08 cfzjywxk

select * from t2 use index(primary) for update -- same result whether for upda

@ekexium Does it happen because the select for update statement does not acquire the pessimistic lock on the unique key but just the row key? Actually, the unique key should also be locked though it's not changed as described in the issue. It seems that if the row key and unique key are always locked Acquisition of lock always resolves constraint check should be able to work.

@cfzjywxk The problem exists in the implementation of option 2 mentioned above (so neither row key nor unique key is locked in the select for update and delete). Option 1 is free from this issue.

ekexium avatar Aug 11 '22 06:08 ekexium

Actually, the unique key should also be locked though it's not changed as described in the https://github.com/pingcap/tidb/issues/36438.

For option 1, would this still be a problem if the lock on a unique key is deferred and then the select for update does not lock it as expected because of the issue 36438 so the constraint resolving does not work on this unique key?

cfzjywxk avatar Aug 11 '22 07:08 cfzjywxk

Actually, the unique key should also be locked though it's not changed as described in the #36438.

For option 1, would this still be a problem if the lock on a unique key is deferred and then the select for update does not lock it as expected because of the issue 36438 so the constraint resolving does not work on this unique key?

I think so. I mean it's free from data-index inconsistency. The seemingly invalid select result is an old issue. It's better to fix it, while our bottom line is that we guarantee the constraint satisfaction of read results only if the transaction successfully commits.

ekexium avatar Aug 11 '22 07:08 ekexium

/merge

cfzjywxk avatar Aug 17 '22 13:08 cfzjywxk

This pull request has been accepted and is ready to merge.

Commit hash: cf11337af2278eb15a70df914f8e48b00fd87f7d

ti-chi-bot avatar Aug 17 '22 13:08 ti-chi-bot

TiDB MergeCI notify

CI Name Result Duration Compare with Parent commit
idc-jenkins-ci/integration-cdc-test 🟢 all 36 tests passed 54 min Existing passed
idc-jenkins-ci-tidb/tics-test 🟢 all 1 tests passed 18 min Existing passed
idc-jenkins-ci-tidb/integration-common-test 🟢 all 17 tests passed 12 min Existing passed
idc-jenkins-ci-tidb/common-test 🟢 all 11 tests passed 12 min Existing passed
idc-jenkins-ci-tidb/integration-ddl-test 🟢 all 6 tests passed 5 min 51 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-2 🟢 all 28 tests passed 5 min 31 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-1 🟢 all 26 tests passed 4 min 18 sec Existing passed
idc-jenkins-ci-tidb/mybatis-test 🟢 all 1 tests passed 3 min 43 sec Existing passed
idc-jenkins-ci-tidb/integration-compatibility-test 🟢 all 1 tests passed 3 min 12 sec Existing passed
idc-jenkins-ci-tidb/plugin-test 🟢 build success, plugin test success 4min Existing passed

sre-bot avatar Aug 17 '22 14:08 sre-bot