libredfish
libredfish copied to clipboard
Core file in libredfish 1.3.4
while using libredfish a core was produced. I believe this was produced when a event was generated and returned, potentially to a service that had been destroyed prior.
Program terminated with signal SIGABRT, Aborted.
(gdb) bt full
#0 __pthread_clockjoin_ex (threadid=2699005728, thread_return=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true) at pthread_join_common.c:145
_a1 = -1595961464
_nr = 240
_a3tmp = 21545
_a3 = 21545
_a4tmp = 0
_a2tmp = 0
_a2 = 0
_a4 = 0
__ret =
@pboyd04 any thoughts on this?
I was able to get further info on this, its rare but still happens. I'm not sure how we get to a refCount = 0 before the listener is killed. This is still using 1.3.4
#0 queuePush (q=0xa3f01df0, value=value@entry=0xa491f118) at ../../../../../../../../../externalsrc/libredfish/src/queue.c:81
node = 0x0
#1 0xa6c5bca0 in addWorkItemToQueue (service=0xa3f00a58, wi=0xa491f118)
at ../../../../../../../../../externalsrc/libredfish/src/asyncEvent.c:796
No locals.
#2 addEventToQueue (service=0xa3f00a58, event=0xa491d588, copy=true) at ../../../../../../../../../externalsrc/libredfish/src/asyncEvent.c:796
ret =
#4 0xa6c5c1ec in tcpThread (args=0x227ad50) at ../../../../../../../../../externalsrc/libredfish/src/asyncEvent.c:682 (gdb) print *data->service $2 = {host = 0x0, queue = 0x0, asyncThread = 2739920672, curl = 0x0, versions = 0x0, flags = 1, sessionToken = 0x0, bearerToken = 0x0, otherAuth = 0x0, pthread_mutex_t = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0, __nusers = 0, {__spins = 0, __list = { __next = 0x0}}}, __size = '\000' <repeats 23 times>, __align = 0}, refCount = 0, selfTerm = false, eventThreadQueue = 0xa3f01df0, eventThread = 2723135264, sseThread = 0, tcpThread = 2706349856, tcpSocket = -1, eventTerm = true, eventRegistrationUri = 0x22a6068 "p\026 \002", zeroMQListener = 0x0, sessionUri = 0x0, freeing = 232}
Things that look important to me in that service
refCount = 0 eventTerm = true // if we are terminating why are we queueing this? sessionToken = 0x0 // we got pretty far in the kill of the
I'd question if this service is even valid anymore. This might be a race?
Yeah I'm pretty sure this is a race condition in the caller. not sure how to fix it in the library.
Closing; no further info to help solve this at the library level...