server
server copied to clipboard
MDEV-33543 Server hang caused by InnoDB change buffer
- [x] The Jira issue number for this PR is: MDEV-33543
Description
Issue: When getting a page (buf_page_get_gen) with no latch option (RW_NO_LATCH), the caller is not expected to follow the B-tree latching order. However in buf_page_get_low we try to acquire shared page latch unconditionally to wait for a page that is being loaded by another thread concurrently. In general it could lead to latch order violation and deadlock.
Currently it affects the change buffer insert path btr_latch_prev() which tries to load the previous page out of order with RW_NO_LATCH and two concurrent inserts into IBUF tree cause deadlock. This problem is introduced in 10.6 by MDEV-27058.
Fix: While trying to latch a page with RW_NO_LATCH, always use the "*lock_try" interface and retry operation on failure after unfixing the page.
Release Notes
MDEV-27058 had introduced this regression in 10.6.6. which could hang the server under load If innodb_change_buffering is enabled and user has tables with non unique secondary index.
How can this PR be tested?
It is difficult to maintain an automated mtr test in our regular test suite. The RQG test and the attached code/test patch in MDEV can repeat the issue.
The MDEV has a repeatable test and steps for manual reproduction of the issue.
Basing the PR against the correct MariaDB version
- [ ] This is a new feature and the PR is based against the latest MariaDB development branch.
- [x] This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.
PR quality check
- [x] I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
- [x] For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.