MDEV-29445 reorganise innodb buffer pool (and remove buffer pool chunks)
- [x] The Jira issue number for this PR is: MDEV-29445
Description
The buffer pool will be mapped in a contiguous memory area that will be aligned and divided to extents of 8 MiB on 64-bit systems and 2 MiB on 32-bit systems.
Within an extent, the first few innodb_page_size pages contain buf_block_t objects that will cover the page frames in the rest
of the extent. In this way, there is a trivial mapping between page frames and block descriptors and we do not need any lookup tables like buf_pool.zip_hash or buf_pool_t::chunk_t::map.
We will always allocate the same number of page frames for an extent, even if we do not need all the buf_block_t in the last extent in case the innodb_buffer_pool_size is not an integer multiple of the of extent size.
my_large_virtual_alloc(): A new function, similar to my_large_malloc().
FIXME: On Microsoft Windows, let the caller know if large page allocation was used. In that case, we must disallow buffer pool resizing.
buf_pool_t::create(): Only initialize the first page descriptor of each chunk.
buf_pool_t::lazy_allocate(): Lazily initialize a previously allocated page descriptor and increase buf_pool.n_blocks, which must be below buf_pool.n_blocks_alloc.
innodb_init_param(): Refactored. We must first validate innodb_page_size and then determine the valid bounds of
innodb_buffer_pool_size.
Release Notes
We deprecate and ignore the parameter innodb_buffer_pool_chunk_size and let the buffer pool size to be changed in arbitrary 1-megabyte increments, all the way up to innodb_buffer_pool_size_max, which must be specified at startup.
If innodb_buffer_pool_size_max is not specified, it will default to twice the specified innodb_buffer_pool_size.
The minimum innodb_buffer_pool_size is 320 pages. At the default innodb_page_size=16k this corresponds to 5 MiB. However, now that the innodb_buffer_pool_size includes the memory allocated for the block descriptors, the minimum would now be innodb_buffer_pool_size=6m.
Innodb_buffer_pool_resize_status will be removed. The SET GLOBAL innodb_buffer_pool_size operation will block until the buffer pool has been resized or the operation aborted by KILL, SHUTDOWN or disconnect.
How can this PR be tested?
This is mostly covered by the regression test suite.
Large-page allocation needs to be tested on Linux and Microsoft Windows. Stress testing with lots of buffer pool resizing during the workload would be helpful.
Basing the PR against the correct MariaDB version
- [ ] This is a new feature and the PR is based against the latest MariaDB development branch.
- [ ] This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.
This is a new feature, but it depends on some refactoring that is currently based on the 10.6 branch.
PR quality check
- [x] I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
- [ ] For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
There needs to be some proper mysys exported functionality, for manipulating virtual memory.
I'm aware, that Linux is missing functionality for some of it, but it provides something similar, so it should be better formalized, imo.
I believe the concept of "commited" memory is universal, rather than Windows-ism, but in case it is unfamiliar, commited memory is something backed by physical memory or page file/ swap ( https://stackoverflow.com/a/25911596/547065 )
Personally, I'd prefer 5 functions, but it can be made 2, if taking VirtualAlloc as inspiration.
-
reserve_virtual_memory(size) Reserve virtual memory address range only. Accessing memory is an error, and should crash Implementation : VirtualAlloc(MEM_RESERVE), mmap (w. asan/MEM_UNDEFINED)
-
commit_virtual_memory(ptr, size) . Commit previously reserved region of memory. It is possible to read/write into region, after it, it is now backed by pagefile/swap Implementation : VirtualAlloc(MEM_COMMIT), madvise/WILL_NEED, asan/MEM_DEFINED
-
allocate_virtual_memory(size, bool large_pages) Reserve and commit in one step. This is the only function that works on Windows, with large pages Implementation : VirtualAlloc(MEM_COMMIT|MEM_RESERVE) possibly with MEM_LARGE_PAGES, mmap.
-
decommit_virtual_memory(ptr,size) Decommit virtual memory address range only. Accessing memory is an error again, and should crash Implementation : VirtualFree(MEM_DECOMMIT), madvise(WONT_NEED), asan/MEM_UNDEFINED
-
free_virtual_memory(ptr) Implementation: VirtualFree(MEM_RELEASE)/munmap