yugabyte-db
yugabyte-db copied to clipboard
[DocDB] Node failed + core dump on load running against node
Jira Link: DB-11254
Description
Case:
- Run sequentially SqlUpdate, SqlDataload, SqlSecondaryIndex, SqlSnapshotTxns, SqlForeignKeyAndJons against 3 nodes RF=3 cluster, c6g.xlarge, 4 CPU 8 GB RAM
- After 1-2 minutes one node is failing and throwing core dump:
(lldb) target create "/home/yugabyte/yb-software/yugabyte-2.23.0.0-b296-almalinux8-aarch64/postgres/bin/postgres" --core "/home/yugabyte/cores/core_31660_1715143194_!home!yugabyte!yb-software!yugabyte-2.23.0.0-b296-almalinux8-aarch64!postgres!bin!postgres"
Core file '/home/yugabyte/cores/core_31660_1715143194_!home!yugabyte!yb-software!yugabyte-2.23.0.0-b296-almalinux8-aarch64!postgres!bin!postgres' (aarch64) was loaded.
(lldb) bt all
* thread #1, name = 'postgres', stop reason = signal SIGSEGV: address not mapped to object
* frame #0: 0x0000ffff971bd5a4 libyb_pggate_webserver.so`std::__1::__hash_const_iterator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, void*>*> std::__1::__hash_table<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::__unordered_map_hasher<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, true>, std::__1::__unordered_map_equal<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>::find<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>(this=<unavailable>, __k=<unavailable>) const at __hash_table:2168:31
frame #1: 0x0000ffff971bc77c libyb_pggate_webserver.so`yb::Status yb::PrometheusWriter::WriteSingleEntryNonTable<unsigned long>(std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned long const&) [inlined] std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>::find[abi:v170002](this=<unavailable>, __k="table_id") const at unordered_map:1534:69
frame #2: 0x0000ffff971bc778 libyb_pggate_webserver.so`yb::Status yb::PrometheusWriter::WriteSingleEntryNonTable<unsigned long>(this=0x0000ffff8c6ad020, attr=<unavailable>, name="yb_ysqlserver_active_connection_total", value=0x0000ffff8c6ad2d8) at metrics_writer.h:44:20
frame #3: 0x0000ffff971b9cc0 libyb_pggate_webserver.so`yb::pggate::PgPrometheusMetricsHandler(yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*) at pgsql_webserver_wrapper.cc:96:3
frame #4: 0x0000ffff971b9b58 libyb_pggate_webserver.so`yb::pggate::PgPrometheusMetricsHandler(req=<unavailable>, resp=<unavailable>) at pgsql_webserver_wrapper.cc:529:3
frame #5: 0x0000ffff971283c0 libserver_process.so`yb::Webserver::Impl::RunPathHandler(yb::Webserver::Impl::PathHandler const&, sq_connection*, sq_request_info*) [inlined] std::__1::__function::__value_func<void (yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*)>::operator()[abi:v170002](this=0x00000688ff810900, __args=0x0000ffff8c6af540, __args=0x0000ffff8c6ad420) const at function.h:517:16
frame #6: 0x0000ffff971283a4 libserver_process.so`yb::Webserver::Impl::RunPathHandler(yb::Webserver::Impl::PathHandler const&, sq_connection*, sq_request_info*) [inlined] std::__1::function<void (yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*)>::operator()(this= Function = yb::pggate::PgPrometheusMetricsHandler(yb::WebCallbackRegistry::WebRequest const&, yb::WebCallbackRegistry::WebResponse*) , __arg=0x0000ffff8c6af540, __arg=0x0000ffff8c6ad540) const at function.h:1168:12
frame #7: 0x0000ffff971283a4 libserver_process.so`yb::Webserver::Impl::RunPathHandler(this=0x00000688ffdfc500, handler=0x00000688ffdcf810, connection=0x00000688ff989000, request_info=<unavailable>) at webserver.cc:648:5
frame #8: 0x0000ffff97127ca0 libserver_process.so`yb::Webserver::Impl::BeginRequestCallback(this=0x00000688ffdfc500, connection=<unavailable>, request_info=0x00000688ff989000) at webserver.cc:567:33
frame #9: 0x0000ffff97133114 libserver_process.so`worker_thread + 5524
frame #10: 0x0000ffffa97d78b8 libpthread.so.0`start_thread + 392
frame #11: 0x0000ffffa9673afc libc.so.6`thread_start + 12
thread #2, stop reason = signal 0
frame #0: 0x0000ffffa97ddc58 libpthread.so.0`pthread_cond_wait@@GLIBC_2.17 + 528
frame #1: 0x0000ffff97131b08 libserver_process.so`master_thread + 1416
frame #2: 0x0000ffffa97d78b8 libpthread.so.0`start_thread + 392
frame #3: 0x0000ffffa9673afc libc.so.6`thread_start + 12
thread #3, stop reason = signal 0
frame #0: 0x0000aaaaac3b8ffc postgres`__do_fini
frame #1: 0x0000ffffaab74cd4 ld-linux-aarch64.so.1`_dl_fini at dl-fini.c:141:9
frame #2: 0x0000ffffa968899c libc.so.6`__run_exit_handlers + 252
frame #3: 0x0000ffffa9688b1c libc.so.6`exit + 28
frame #4: 0x0000aaaaac8900f4 postgres`proc_exit(code=0) at ipc.c:157:2
frame #5: 0x0000ffff96f93840 yb_pg_metrics.so`webserver_worker_main(unused=<unavailable>) at yb_pg_metrics.c:443:3
frame #6: 0x0000aaaaac7e9204 postgres`StartBackgroundWorker at bgworker.c:849:2
frame #7: 0x0000aaaaac802594 postgres`maybe_start_bgworkers [inlined] do_start_bgworker(rw=0x00000688ffd102c0) at postmaster.c:6100:4
frame #8: 0x0000aaaaac802538 postgres`maybe_start_bgworkers at postmaster.c:6326:9
frame #9: 0x0000aaaaac7feadc postgres`PostmasterMain(argc=<unavailable>, argv=<unavailable>) at postmaster.c:1432:2
frame #10: 0x0000aaaaac6f8544 postgres`PostgresServerProcessMain(argc=25, argv=0x00000688ffd120d0) at main.c:234:3
frame #11: 0x0000aaaaac3b90b8 postgres`main + 36
frame #12: 0x0000ffffa9674384 libc.so.6`__libc_start_main + 220
frame #13: 0x0000aaaaac3b8f74 postgres`_start + 52
Version: 2.23.0.0-b296
Logs in JIRA task
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
- [X] I confirm this issue does not contain any sensitive information.
@pilshchikov , Do you think this is a recent regression? Are we able to triangulate/narrow down the builds?
@rthallamko3 it starts happen only between 2.23.0.0-b247-2.23.0.0-b265 on master branch and 2024.1.0.0-b104-2024.1.0.0-b122 on 2024.1 branch
Duplicate of https://github.com/yugabyte/yugabyte-db/issues/17847
Per Yusong, closing this as DUP of https://github.com/yugabyte/yugabyte-db/issues/17847