[MDEV-32652] New sql_mode to prevent BEGIN inside a transaction
- [x] The Jira issue number for this PR is: MDEV-32652
Description
Currently a 'START TRANSACTION'/'BEGIN' statement implicitly commits the existing transaction and starts a new one. This conflicts with ANSI SQL standard (2023):
If an SQL-transaction is currently active, then an exception condition
is raised: invalid transaction state — active SQL-transaction (25001).
Thus change behavior to emit ER_CANT_DO_THIS_DURING_AN_TRANSACTION when such statement is issued inside an active transaction.
Before:
> START TRANSACTION;
> START TRANSACTION; ->This will work and trigger an implicitly commit
Now:
> START TRANSACTION;
> START TRANSACTION;
ERROR 25000: You are not allowed to execute this command in a transaction
To maintain backward compatibility, this is NOT enabled by default until sql_mode is set with NO_NEW_TRANS_IN_TRANS. This flag will also be set automatically when sql_mode is set with ANSI group, because the new feature aligns with the tenet of ANSI group.
Note 1:
This will be reflected as error 25000, which is different than 25001 referred by SQL standard. This is because 25001 in MariaDB is reserved for another error when modifying isolation level of a transaction.
Note 2:
This does not error when there is a implicit multi-statement transaction (aka. autocommit=off) because doing so is essentially disabling this statement when autocommit=off and that severely breaks compatibility. See code comment for more details.
How can this PR be tested?
All existing test cases in main, sys_vars and func_1 test suite passed. Two new test cases added into main/sql_mode and main/sql_mode_transaction for the new feature.
Basing the PR against the correct MariaDB version
- [x] This is a new feature and the PR is based against the latest MariaDB development branch
Backward compatibility
To maintain backward compatibility, this is NOT enabled by default until sql_mode is set with NO_NEW_TRANS_IN_TRANS. This flag will also be set automatically when sql_mode is set with ANSI group, because the new feature aligns with the tenet of ANSI group.
PR quality check
- [x] I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
Copyright
All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
Checks buildbot/amd64-debian-11-debug-ps-embedded:
Test failed for innodb_gis.rtree_purge with error Test case timeout after 900 seconds, but it passed on automatic re-try. This is the only failed test. Not sure why this check is marked as failed.
innodb_gis.rtree_purge '32k,innodb' w9 [ retry-pass ] 480054
Checks buildbot/amd64-fedora-38-last-N-failed:
Tests that failed is galera_3nodes.galera_gtid_consistency
At line 15: query 'select 1' failed with wrong errno <Unknown> (2026): 'TLS/SSL error: shutdown while in init', instead of (0)...
This error is not in the list of expected errors defined in line of "wait_until_connected_again.inc". I do not believe this is relevant to my change. Reason is more like the newly added test galera_gtid_consistency (added 1 month ago) needs better timing control. I have seen this test failure in other pull requests as well.
Please let me know if it makes sense, and help approve if so. Thanks!
Checks
buildbot/amd64-debian-11-debug-ps-embedded:Test failed for
innodb_gis.rtree_purgewith errorTest case timeout after 900 seconds, but it passed on automatic re-try. This is the only failed test. Not sure why this check is marked as failed.
MDEV-15284 and some other open bugs demonstrate that the locking in InnoDB SPATIAL INDEX is broken from the beginning. You can ignore that.
This error is not in the list of expected errors defined in line of "wait_until_connected_again.inc". I do not believe this is relevant to my change. Reason is more like the newly added test
galera_gtid_consistency(added 1 month ago) needs better timing control. I have seen this test failure in other pull requests as well.
Galera tests are notably unstable, partly due to race conditions in the tests or in the code, partly due to some not-accounted-for error codes in case of disconnect. This one might be a simple case of MDEV-30587, which is currently assigned to @grooverdan.
Thanks @dr-m for reviewing! Could you please clarify your feedback a bit and state what (if anything) you request to be changed or as next steps in general for this change?
Thanks @dr-m for reviewing! Could you please clarify your feedback a bit and state what (if anything) you request to be changed or as next steps in general for this change?
@ottok My comments were not meant to be a review, and this ticket has not been assigned to my review. I was only trying to explain why some test failures are unrelated to these changes. Recently, there has been some effort to fix frequently failing tests; I believe that the last-N-failed builders are related to that.
My area of expertise is the InnoDB storage engine, and this pull request is not changing any code there.
Somewhat related to this, in MDEV-24813 I found out that LOCK TABLES can implicitly commit a transaction. That is partly why you need to write something funny to acquire an exclusive table lock in InnoDB:
SET autocommit = 0;
START TRANSACTION;
LOCK TABLE t1 WRITE;
It could be interesting to add some tests for LOCK TABLES. It might be not necessary to change any related code, just to document what would happen.