server icon indicating copy to clipboard operation
server copied to clipboard

MDEV-35049: Improve adaptive hash index scalability

Open dr-m opened this issue 1 year ago • 6 comments
trafficstars

  • [x] The Jira issue number for this PR is: MDEV-35049

Description

MEM_HEAP_BTR_SEARCH: Remove. Let us handle this special type of mem_heap_t allocations in the only compilation unit, btr0sea.cc, specifically, btr_search_sys_t::partition::insert(), which replaces the function ha_insert_for_fold().

mem_block_info_t::ahi_block: Replaces free_block. This caches one buffer page for use in adaptive hash index allocations. This is protected by btr_search_sys_t::partition::latch. It only is Atomic_relaxed because btr_search_free_space() is following a pattern of test, lock, and test.

btr_search_check_free_space(): Protect the ahi_block with a shared AHI partition latch instead of an exclusive one. We must recheck btr_search_enabled after acquiring the latch in order to avoid a race condition with btr_search_disable(). Using a shared latch instead of an exclusive one should reduce contention with btr_search_guess_on_hash() and other operations when running with innodb_adaptive_hash_index=ON.

Release Notes

The performance of innodb_adaptive_hash_index with larger numbers of threads was improved at high concurrency.

How can this PR be tested?

./mtr --parallel=auto --big-test --force --mysqld=--loose-innodb-adaptive-hash-index=ON --max-test-fail=0

I tested this with and without cmake -DWITH_INNODB_AHI=OFF.

Using the innodb-hashtest.sh I still see some waits for an AHI partition latch. Here are all functions that exceed 1% of perf record samples:

  27,97%  mariadbd  mariadbd             [.] btr_search_guess_on_hash(dict_index_t*, btr_search_t*, dtuple_t const*, unsigned long, unsigned long, btr_cur_t*, mtr_t*)
  21,10%  mariadbd  mariadbd             [.] ssux_lock_impl<true>::rd_wait()
   2,93%  mariadbd  mariadbd             [.] btr_search_sys_t::partition::insert(unsigned long, unsigned char const*)
   2,89%  mariadbd  mariadbd             [.] row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)
   2,17%  mariadbd  mariadbd             [.] void rec_init_offsets_comp_ordinary<false, false>(unsigned char const*, dict_index_t const*, unsigned short*, unsigned long, dict_col_t::def_t const*, rec_leaf_format)
   1,92%  mariadbd  [kernel.kallsyms]    [k] psi_group_change
   1,75%  mariadbd  [kernel.kallsyms]    [k] try_to_wake_up
   1,47%  mariadbd  mariadbd             [.] btr_search_drop_page_hash_index(buf_block_t*, bool)
   1,36%  mariadbd  mariadbd             [.] ssux_lock_impl<true>::wr_wait(unsigned int)
   1,34%  mariadbd  mariadbd             [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned short*, unsigned long, unsigned long, mem_block_info_t**)
   1,31%  mariadbd  mariadbd             [.] ha_delete_hash_node(hash_table_t*, mem_block_info_t*, ha_node_t*)
   1,31%  mariadbd  mariadbd             [.] Row_sel_get_clust_rec_for_mysql::operator()(row_prebuilt_t*, dict_index_t*, unsigned char const*, que_thr_t*, unsigned char const**, unsigned short**, mem_block_info_t**, dtuple_t**, mtr_t*)
   1,22%  mariadbd  mariadbd             [.] btr_cur_t::search_leaf(dtuple_t const*, page_cur_mode_t, btr_latch_mode, mtr_t*)
   1,16%  mariadbd  mariadbd             [.] evaluate_join_record(JOIN*, st_join_table*, int)
   1,15%  mariadbd  mariadbd             [.] cmp_dtuple_rec_with_match_low(dtuple_t const*, unsigned char const*, unsigned short const*, unsigned long, unsigned long*)
   1,06%  mariadbd  [vdso]               [.] __vdso_gettimeofday

It should be noted that conflicts are inevitable under this workload, because ha_delete_hash_node() is covered by an exclusive AHI partition latch.

I repeated the experiment with an additional patch that disables the spin loops in this subsystem:

diff --git a/storage/innobase/btr/btr0cur.cc b/storage/innobase/btr/btr0cur.cc
index 9fb5bb4597a..f8c19922729 100644
--- a/storage/innobase/btr/btr0cur.cc
+++ b/storage/innobase/btr/btr0cur.cc
@@ -2620,7 +2620,7 @@ btr_cur_optimistic_insert(
 		ut_ad(flags == BTR_NO_LOCKING_FLAG);
 	} else if (index->table->is_temporary()) {
 	} else {
-		srw_spin_lock* ahi_latch = btr_search_sys.get_latch(*index);
+		auto ahi_latch = btr_search_sys.get_latch(*index);
 		if (!reorg && cursor->flag == BTR_CUR_HASH) {
 			btr_search_update_hash_node_on_insert(
 				cursor, ahi_latch);
@@ -3402,7 +3402,7 @@ btr_cur_update_in_place(
 
 #ifdef BTR_CUR_HASH_ADAPT
 	{
-		srw_spin_lock* ahi_latch = block->index
+		auto ahi_latch = block->index
 			? btr_search_sys.get_latch(*index) : NULL;
 		if (ahi_latch) {
 			/* TO DO: Can we skip this if none of the fields
diff --git a/storage/innobase/btr/btr0sea.cc b/storage/innobase/btr/btr0sea.cc
index 968ca1993d2..4fd63a4c31e 100644
--- a/storage/innobase/btr/btr0sea.cc
+++ b/storage/innobase/btr/btr0sea.cc
@@ -34,6 +34,8 @@ Created 2/17/1996 Heikki Tuuri
 #include "btr0btr.h"
 #include "srv0mon.h"
 
+#define srw_spin_lock srw_lock
+
 /** Is search system enabled.
 Search system is protected by array of latches. */
 char		btr_search_enabled;
diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc
index be721a51f3c..6368ec62909 100644
--- a/storage/innobase/handler/handler0alter.cc
+++ b/storage/innobase/handler/handler0alter.cc
@@ -5898,7 +5898,7 @@ static bool innobase_instant_try(
 #ifdef BTR_CUR_HASH_ADAPT
 	/* Acquire the ahi latch to avoid a race condition
 	between ahi access and instant alter table */
-	srw_spin_lock* ahi_latch = btr_search_sys.get_latch(*index);
+	auto ahi_latch = btr_search_sys.get_latch(*index);
 	ahi_latch->wr_lock(SRW_LOCK_CALL);
 #endif /* BTR_CUR_HASH_ADAPT */
 	const bool metadata_changed = ctx->instant_column();
diff --git a/storage/innobase/include/btr0sea.h b/storage/innobase/include/btr0sea.h
index a8d746bf8ce..8feb892395a 100644
--- a/storage/innobase/include/btr0sea.h
+++ b/storage/innobase/include/btr0sea.h
@@ -36,6 +36,8 @@ Created 2/17/1996 Heikki Tuuri
 extern mysql_pfs_key_t btr_search_latch_key;
 #endif /* UNIV_PFS_RWLOCK */
 
+#define srw_spin_lock srw_lock
+
 #define btr_search_sys_create() btr_search_sys.create()
 #define btr_search_sys_free() btr_search_sys.free()
 
@@ -354,6 +356,7 @@ again set this much timeout. This is to reduce contention. */
 #define BTR_SEA_TIMEOUT			10000
 #endif /* BTR_CUR_HASH_ADAPT */
 
+#undef srw_spin_lock
 #include "btr0sea.inl"
 
 #endif

With this additional patch, the spin loop is gone:

  28,01%  mariadbd  mariadbd             [.] btr_search_guess_on_hash(dict_index_t*, btr_search_t*, dtuple_t const*, unsigned long, unsigned long, btr_cur_t*, mtr_t*)
  19,94%  mariadbd  mariadbd             [.] btr_search_sys_t::partition::insert(unsigned long, unsigned char const*)
  15,99%  mariadbd  mariadbd             [.] btr_search_drop_page_hash_index(buf_block_t*, bool)
  15,03%  mariadbd  mariadbd             [.] ha_delete_hash_node(hash_table_t*, mem_block_info_t*, ha_node_t*)
   1,25%  mariadbd  mariadbd             [.] void rec_init_offsets_comp_ordinary<false, false>(unsigned char const*, dict_index_t const*, unsigned short*, unsigned long, dict_col_t::def_t const*, rec_leaf_format)
   1,23%  mariadbd  mariadbd             [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned short*, unsigned long, unsigned long, mem_block_info_t**)
   0,88%  mariadbd  mariadbd             [.] row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)
   0,86%  mariadbd  mariadbd             [.] btr_cur_t::search_leaf(dtuple_t const*, page_cur_mode_t, btr_latch_mode, mtr_t*)
   0,78%  mariadbd  libc.so.6            [.] __memcmp_avx2_movbe
   0,66%  mariadbd  mariadbd             [.] cmp_dtuple_rec_with_match_low(dtuple_t const*, unsigned char const*, unsigned short const*, unsigned long, unsigned long*)
   0,52%  mariadbd  mariadbd             [.] cmp_data(unsigned long, unsigned long, unsigned char const*, unsigned long, unsigned char const*, unsigned long)
   0,52%  mariadbd  mariadbd             [.] mtr_memo_slot_t::release() const
   0,47%  mariadbd  mariadbd             [.] trx_purge(unsigned long, unsigned long)
   0,45%  mariadbd  mariadbd             [.] ssux_lock_impl<false>::rd_wait()
   0,44%  mariadbd  mariadbd             [.] btr_search_check_guess(btr_cur_t*, bool, dtuple_t const*, unsigned long) [clone .constprop.0]
   0,42%  mariadbd  mariadbd             [.] mtr_t::memset(buf_block_t const*, unsigned long, unsigned long, unsigned char)
   0,41%  mariadbd  [vdso]               [.] __vdso_gettimeofday

The ssux_lock_impl<false>::rd_wait() above will involve context switches due to futex system calls.

This needs to be tested with a larger workload that originally reproduced the scalability issue.

Basing the PR against the correct MariaDB version

  • [ ] This is a new feature or a refactoring, and the PR is based against the main branch.
  • [x] This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

This should be directly applicable to 10.5, but the patch is currently is based on the 10.6 branch. For 10.5, we might also limit the change to btr_search_check_free_space_in_heap().

PR quality check

  • [x] I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • [ ] For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

dr-m avatar Oct 01 '24 09:10 dr-m