resiliency can be improved when facing constrained memory situations
Environment Information
- PMDK package version(s): 1.12.1
- OS(es) version(s): CentOS 7.9
- ndctl version(s): ndctl-65-5
- kernel version(s): 3.10.0-1160.45.1.el7.x86_64
- compiler, libraries, packaging and other related tools version(s): DAOS v2.3
Please provide a reproduction of the bug:
Running DAOS Servers as a simple user with memory limits.
How often bug is revealed: (always, often, rare): rare
Actual behavior:
Silent abort() or SEGV, leading to unnecessary/wasted time spent to debug.
Expected behavior:
graceful handling.
Details
1st case/abort() encountered with a stack trace like following :
#0 0x00002aeb80a98387 in raise () from /lib64/libc.so.6
#1 0x00002aeb80a99a78 in abort () from /lib64/libc.so.6
#2 0x00002aeb803d48bc in out_get_errormsg () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/pmdk/lib/libpmemobj.so.1
#3 0x00002aeb803d4978 in out_err () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/pmdk/lib/libpmemobj.so.1
#4 0x00002aeb803d4fcc in ravl_emplace () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/pmdk/lib/libpmemobj.so.1
#5 0x00002aeb803f720e in pmemobj_tx_add_common.constprop.21 () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/pmdk/lib/libpmemobj.so.1
#6 0x00002aeb803f814c in pmemobj_tx_add_range_direct () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/pmdk/lib/libpmemobj.so.1
#7 0x00002aeb7ec7fadf in pmem_tx_add_ptr (umm=<optimized out>, ptr=<optimized out>, size=<optimized out>) at src/common/mem.c:133
#8 0x00002aeb7f4a0823 in umem_tx_add_ptr (size=8, ptr=0x2af703cd2bd8, umm=0x564a24f6fb80) at src/include/daos/mem.h:470
#9 ilog_ptr_set_full (lctx=lctx@entry=0x564a24f6fb30, dest=dest@entry=0x2af703cd2bd8, src=src@entry=0x2aedb9aa3490, len=len@entry=8) at src/vos/ilog.c:345
#10 0x00002aeb7f4a0b95 in update_inplace (lctx=0x564a24f6fb30, id_out=0x2af703cd2bd8, id_in=0x2aedb9aa3660, opc=1, is_equal=<optimized out>) at src/vos/ilog.c:661
#11 0x00002aeb7f4a11af in ilog_modify (loh=..., id_in=id_in@entry=0x2aedb9aa3660, epr=epr@entry=0x2aedb9aa35e0, opc=opc@entry=1) at src/vos/ilog.c:926
#12 0x00002aeb7f4a411e in ilog_persist (loh=..., id=id@entry=0x2aedb9aa3660) at src/vos/ilog.c:1017
#13 0x00002aeb7f485ef4 in dtx_ilog_rec_release (abort=96, dae=0x2aedbd922838, rec=<optimized out>, cont=0x0, umm=0x2aedbd922838) at src/vos/vos_dtx.c:561
#14 do_dtx_rec_release (umm=umm@entry=0x2aedbd5ae2b8, cont=cont@entry=0x2aedbd6efe20, dae=dae@entry=0x2aedbd922838, rec=<optimized out>, abort=abort@entry=false) at src/vos/vos_dtx.c:583
#15 0x00002aeb7f4861ba in do_dtx_rec_release (abort=false, rec=<optimized out>, dae=0x2aedbd922838, cont=0x2aedbd6efe20, umm=0x2aedbd5ae2b8) at src/vos/vos_dtx.c:578
#16 dtx_rec_release (cont=cont@entry=0x2aedbd6efe20, dae=0x2aedbd922838, abort=abort@entry=false) at src/vos/vos_dtx.c:688
#17 0x00002aeb7f48a02c in vos_dtx_commit_one (fatal=<synthetic pointer>, rm_cos=0x2aedbfe5745d, dae_p=0x2aedbfe92a58, dce_p=<synthetic pointer>, cmt_time=946528699681341440, epoch=0, dti=0x2aedbff49588, cont=0x2aedbd6efe20)
at src/vos/vos_dtx.c:856
#18 vos_dtx_commit_internal (cont=cont@entry=0x2aedbd6efe20, dtis=dtis@entry=0x2aedbff47050, count=count@entry=512, epoch=epoch@entry=0, rm_cos=rm_cos@entry=0x2aedbfe572d0, daes=daes@entry=0x2aedbfe91df0, dces=dces@entry=0x2aedbff4a060)
at src/vos/vos_dtx.c:1949
#19 0x00002aeb7f48d058 in vos_dtx_commit (coh=..., dtis=dtis@entry=0x2aedbff47050, count=count@entry=512, rm_cos=rm_cos@entry=0x2aedbfe572d0) at src/vos/vos_dtx.c:2140
#20 0x00002aeb8b944e97 in dtx_commit (cont=cont@entry=0x2aedbd6efa50, dtes=0x2aedbff2eef0, dcks=0x2aedbff43040, count=count@entry=512, epoch=epoch@entry=0) at src/dtx/dtx_rpc.c:813
#21 0x00002aeb8b950a2c in dtx_batched_commit_one (arg=0x2aedbd6f0d80) at src/dtx/dtx_common.c:518
#22 0x00002aeb7ff7f4aa in ABTD_ythread_func_wrapper () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/argobots/lib/libabt.so.1
#23 0x00002aeb7ff7f651 in make_fcontext () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/argobots/lib/libabt.so.1
#24 0x0000000000000000 in ?? ()
Further corefile analysis along with associated pmdk/libpmemobj source code review indicates that the abort()/FATAL() comes from the error-return handling from the malloc() call in Last_errormsg_get() :
static inline struct errormsg *
Last_errormsg_get(void)
{
Last_errormsg_key_alloc();
struct errormsg *errormsg = os_tls_get(Last_errormsg_key);
if (errormsg == NULL) {
errormsg = malloc(sizeof(struct errormsg));
if (errormsg == NULL)
FATAL("!malloc");
/* make sure it contains empty string initially */
errormsg->msg[0] = '\0';
int ret = os_tls_set(Last_errormsg_key, errormsg);
if (ret)
FATAL("!os_tls_set");
}
return errormsg;
}
"src/core/out.c" [readonly] line 69 of 598 --11%-- col 3-10
and more source code analysis also indicates that this behaviour could be avoided if these allocation/initialisation stuff could be done upstream during out_init(). What do you think ??
2nd case/SEGV encountered with a stack trace like following :
#0 0x00002b6a9e82c42f in ucs_debug_backtrace_next (bckt=0x0, line=line@entry=0x2b950edea8c0) at debug/debug.c:495
#1 0x00002b6a9e82c6a7 in ucs_debug_print_backtrace (stream=0x2b6a98b3a1c0 <_IO_2_1_stderr_>, strip=strip@entry=2) at debug/debug.c:656
#2 0x00002b6a9e82e945 in ucs_handle_error (message=message@entry=0x2b6a9e8f0d27 "tkill(2) or tgkill(2)") at debug/debug.c:1081
#3 0x00002b6a9e82ed0c in ucs_debug_handle_error_signal (signo=signo@entry=11, cause=0x2b6a9e8f0d27 "tkill(2) or tgkill(2)", fmt=fmt@entry=0x2b6a9e8f0f8b " at address %p") at debug/debug.c:1033
#4 0x00002b6a9e82ef7b in ucs_error_signal_handler (signo=11, info=0x2b950edead30, context=<optimized out>) at debug/debug.c:1055
#5 <signal handler called>
#6 0x00002b6a9810266c in palloc_defer_free_create () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/pmdk/lib/libpmemobj.so.1
#7 0x00002b6a981090a1 in pmemobj_tx_xfree () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/pmdk/lib/libpmemobj.so.1
#8 0x00002b6a9698fbae in pmem_tx_free (umm=0x2b6d155ae2b8, umoff=72350928) at src/common/mem.c:92
#9 0x00002b6a971960e2 in dtx_rec_release (cont=cont@entry=0x2b6d156efe20, dae=0x2b6d174b4d60, abort=abort@entry=true) at src/vos/vos_dtx.c:665
#10 0x00002b6a9719d9ea in vos_dtx_abort (coh=..., dti=0x2b6d17f87cf0, epoch=epoch@entry=946528706794618887) at src/vos/vos_dtx.c:2204
#11 0x00002b6aa3654aea in dtx_abort (cont=cont@entry=0x2b6d156efa50, dte=dte@entry=0x2b6d17f87cf0, epoch=946528706794618887) at src/dtx/dtx_rpc.c:896
#12 0x00002b6aa3664382 in dtx_leader_end (dlh=0x2b6d17f87cf0, coh=<optimized out>, result=<optimized out>) at src/dtx/dtx_common.c:1290
#13 0x00002b6aa3d9d114 in ds_obj_rw_handler (rpc=0x2b6d17db1db0) at src/object/srv_obj.c:2732
#14 0x00002b6a96ed1f68 in crt_handle_rpc (arg=0x2b6d17db1db0) at src/cart/crt_rpc.c:1654
#15 0x00002b6a97c8f4aa in ABTD_ythread_func_wrapper () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/argobots/lib/libabt.so.1
#16 0x00002b6a97c8f651 in make_fcontext () from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../prereq/release/argobots/lib/libabt.so.1
#17 0x00002b950ecec000 in ?? ()
#18 0x0000000000100000 in ?? ()
#19 0x00002b6a96ed1f20 in ?? () at src/cart/crt_rpc.c:663 from /work2/08126/dbohninx/frontera/BUILDS/daos-11940_pr10837/20221116/daos/install/bin/../lib64/libcart.so.4
#20 0x00002b6d17db1db0 in ?? ()
#21 0x00002b950edebfe0 in ?? ()
#22 0x00002b950edebfe0 in ?? ()
#23 0x000055e1455968d0 in ?? ()
#24 0x000055e14503f5f0 in spi_key_cmp (htable=<optimized out>, rlink=<optimized out>, key=<optimized out>, len=<optimized out>) at src/engine/sched.c:289
#25 0x0000000000000000 in ?? ()
Further corefile analysis along with associated pmdk/libpmemobj source code review indicates that the SEGV has occurred due to a 0xffffffffffffff80 invalid action pointer being returned by tx_action_add() and to be passed as 3rd parameter to palloc_defer_free() in pmemobj_tx_xfree(), and looking more into the associated code (in core/out.c, libpmemobj/palloc.c, common/mem.c, libpmemobj/tx.c, common/vec.h), this should be caused by the fact tx->actions->buffer value is NULL due to a previous Realloc() error in vec_reserve(). For this issue, I believe this could be handled by testing the VEC_INC_BACK() negative return value in tx_action_add() and thus have this latter to return NULL in this case instead of an invalid pointer. What do you think ?
I would be happy to push a PR for both issues if you agree with my analysis, just let me know.
Additional information about Priority and Help Requested:
Are you willing to submit a pull request with a proposed change? Yes
Requested priority: Medium
- This structure is allocated on the first call for the thread. I don't think it's possible to preallocate it in
out_initunless you specifically know the number of threads that will ever call this function. Maybe we should have a statically allocated special const string that we return from theLast_errormsg_getin case of an error? - Yes, checking the return value of
VEC_INC_BACKin thetx_action_addshould solve the problem.
Unfortunately, there are also other places where we use VEC_PUSH_BACK without proper error-checking. Most of them are relatively easy to fix, but there's one where there's a potential error in the tx_commit path (where we cannot fail, and crashing is likely preferable). https://github.com/pmem/pmdk/blob/06f10e796724ac3599f70ac2fa802df3c6d90b9f/src/libpmemobj/tx.c#L1017 https://github.com/pmem/pmdk/blob/06f10e796724ac3599f70ac2fa802df3c6d90b9f/src/libpmemobj/memops.c#L617
I think the fix here is to preallocate the next vector in tx_construct_user_buffer, somewhere around here:
https://github.com/pmem/pmdk/blob/06f10e796724ac3599f70ac2fa802df3c6d90b9f/src/libpmemobj/tx.c#L708
- This structure is allocated on the first call for the thread. I don't think it's possible to preallocate it in
out_initunless you specifically know the number of threads that will ever call this function. Maybe we should have a statically allocated special const string that we return from theLast_errormsg_getin case of an error?
Ah yes, I forgot to check if out_init() was also executed in the context of each thread !!... I will try to find an other place in the early steps of each thread.
- Yes, checking the return value of
VEC_INC_BACKin thetx_action_addshould solve the problem.
So, should I at least fix this one this way ?
Unfortunately, there are also other places where we use
VEC_PUSH_BACKwithout proper error-checking. Most of them are relatively easy to fix, but there's one where there's a potential error in the tx_commit path (where we cannot fail, and crashing is likely preferable).
I will have a look to these other "simple" cases and to fix them the same way.
https://github.com/pmem/pmdk/blob/06f10e796724ac3599f70ac2fa802df3c6d90b9f/src/libpmemobj/tx.c#L1017
https://github.com/pmem/pmdk/blob/06f10e796724ac3599f70ac2fa802df3c6d90b9f/src/libpmemobj/memops.c#L617
I think the fix here is to preallocate the
nextvector intx_construct_user_buffer, somewhere around here: https://github.com/pmem/pmdk/blob/06f10e796724ac3599f70ac2fa802df3c6d90b9f/src/libpmemobj/tx.c#L708
Sorry, I don't get this. Is this related to the very special case you stated in the "tx_commit path" ??
- I'm not aware of any portable and reliable way of hooking into thread construction. I'd say it would be easier to just return
NULLinout_get_errormsgand handle that inout_error(by printing some static string saying that there's no memory to properly format error strings). - Yes.
- Yes, that's my proposal for how to fix the potential out-of-memory error that might happen in
tx_commit.
Sorry to be late on this...
- I'm not aware of any portable and reliable way of hooking into thread construction. I'd say it would be easier to just return NULL in out_get_errormsg and handle that in out_error (by printing some static string saying that there's no memory to properly format error strings).
Ok, after looking more how our DAOS code uses PMDK code, I understand that threads creation is not handled in PMDK, so I better have to follow your idea ;-) See PR-5536
- Yes.
And including the other occurrences you had pointed already... This is in PR-5537
- Yes, that's my proposal for how to fix the potential out-of-memory error that might happen in tx_commit.
Hmm, I have tried to do something in this direction also in PR-5537 ...