server icon indicating copy to clipboard operation
server copied to clipboard

MDEV-34750 SET GLOBAL innodb_log_file_size is not crash safe

Open dr-m opened this issue 1 year ago • 2 comments
trafficstars

  • [x] The Jira issue number for this PR is: MDEV-34750

Description

log_t::resize_write_buf(): Revert an inadvertent change that had been made in commit 4ca355d863e3b6a14439eebbb2958afccb3548e3 (MDEV-33894). The being-resized ib_logfile101 is supposed to be written from log_sys.resize_flush_buf or log_sys.resize_buf, not log_sys.buf.

Release Notes

Executing SET GLOBAL innodb_log_file_size would corrupt the InnoDB write-ahead log file.

How can this PR be tested?

Apparently, this could be covered better by mtr. We do have the test innodb.log_file_size_online that covers this functionality, and in fact that test started to fail a few pushes after the culprit 4ca355d863e3b6a14439eebbb2958afccb3548e3 was pushed (specifically, for f6989d1767ade3486af35246a8ad5be50507ca10). This is easy to catch in Random Query Generator based stress testing that includes killing and restarting the server during a write workload.

Basing the PR against the correct MariaDB version

  • [ ] This is a new feature or a refactoring, and the PR is based against the latest MariaDB development branch.
  • [x] This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • [x] I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • [ ] For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

dr-m avatar Aug 13 '24 10:08 dr-m

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Aug 13 '24 10:08 CLAassistant

Sorry, this is not this simple. I think that I need a way of reproducing the failure myself. I got an rr replay trace where the server was killed just after completing log resizing. Just before that, it would have written 512 bytes of 0xa5 (the TRASH_ALLOC value) to the log file offset 0x3000.

dr-m avatar Aug 14 '24 12:08 dr-m