decouple olap tx timeout from oltp tx timeout
Description
Since workload=olap bypasses the query timeouts
(--queryserver-config-query-timeout) and also row limits, the natural
assumption is that it also bypasses the transaction timeout.
This is not the case, e.g. for a tablet where the
--queryserver-config-transaction-timeout is 10.
This PR
- Adds new CLI flag and YAML field to independently configure TX
timeouts for OLAP workloads
--queryserver-config-olap-transaction-timeoutwith a default value of0seconds, disabling OLAP TX timeouts. - Decouples TX kill interval from OLTP TX timeout via new CLI flag and
YAML field
--queryserver-config-transaction-killer-intervaldefaulting to3seconds.
One subtlety is that the timeout that is applied to the transaction is based on the value of the workload setting at the beginning of the transaction. If the workload is changed mid-transaction, that may change the timeout applied to queries within the transaction, but it won't change the transaction timeout.
Demo
Using (new) default values), connected to VTGate.
mysql> set workload=oltp;
Query OK, 0 rows affected (0.00 sec)
mysql> begin ; select 1 from data limit 1; select sleep(35); commit;
Query OK, 0 rows affected (0.00 sec)
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)
ERROR 1317 (70100): target: dst.-.primary: vttablet: (errno 2013) due to context deadline exceeded, elapsed time: 30.00055243s, killing query ID 106 (CallerID: maxenglander)
ERROR 1317 (70100): target: dst.-.primary: vttablet: rpc error: code = Aborted desc = transaction 1659665741215176173: ended at 2022-08-05 02:18:29.360 UTC (unlocked closed connection) (CallerID: maxenglander)
mysql> set workload=olap;
Query OK, 0 rows affected (0.01 sec)
mysql> begin ; select 1 from data limit 1; select sleep(35); commit;
Query OK, 0 rows affected (0.00 sec)
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)
+-----------+
| sleep(35) |
+-----------+
| 0 |
+-----------+
1 row in set (35.00 sec)
Query OK, 0 rows affected (0.01 sec)
Breaking changes
Currently OLAP transactions are killed after --queryserver-config-transaction-timeout seconds. With this PR, OLAP transactions are killed after --queryserver-config-olap-transaction-timeout seconds (default value 0 means transactions are not timed out).
Currently OLTP and OLAP transactions are evaluated for killing every --queryserver-config-transaction-timeout seconds divided by 10. With this PR, OLAP and OLTP transactions are evaluated for killing every --queryserver-config-transaction-killer-interval seconds.
Related Issue(s)
#10945
Checklist
- [ ] "Backport me!" label has been added if this change should be backported
- [ ] Tests were added or are not required
- [ ] Documentation was added or is not required
Deployment Notes
Review Checklist
Hello reviewers! :wave: Please follow this checklist when reviewing this Pull Request.
General
- [ ] Ensure that the Pull Request has a descriptive title.
- [ ] If this is a change that users need to know about, please apply the
release notes (needs details)label so that merging is blocked unless the summary release notes document is included. - [ ] If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive.
- [ ] If a workflow is added or modified, each items in
Jobsshould be named in order to mark it asrequired. If the workflow should be required, the GitHub Admin should be notified.
Bug fixes
- [ ] There should be at least one unit or end-to-end test.
- [ ] The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.
Non-trivial changes
- [ ] There should be some code comments as to why things are implemented the way they are.
New/Existing features
- [ ] Should be documented, either by modifying the existing documentation or creating new documentation.
- [ ] New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.
Backward compatibility
- [ ] Protobuf changes should be wire-compatible.
- [ ] Changes to
_vttables and RPCs need to be backward compatible. - [ ]
vtctlcommand output order should be stable andawk-able.
@maxenglander we'll get this reviewed. Can you add notes to the 15_0_0_summary.md file and resolve the conflicts in the meantime?
@deepthi conflicts resolved and release notes added
@harshit-gangal I implemented the one outstanding suggestion I found (timeout => conn.timeout), sorry for missing that until now. Also reverted the unrelated fmt changes (used git commit --no-verify).