libredfish icon indicating copy to clipboard operation
libredfish copied to clipboard

Core file in libredfish 1.3.4

Open cowboyjeeper opened this issue 3 years ago • 4 comments

while using libredfish a core was produced. I believe this was produced when a event was generated and returned, potentially to a service that had been destroyed prior.

Program terminated with signal SIGABRT, Aborted.

(gdb) bt full #0 __pthread_clockjoin_ex (threadid=2699005728, thread_return=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true) at pthread_join_common.c:145 _a1 = -1595961464 _nr = 240 _a3tmp = 21545 _a3 = 21545 _a4tmp = 0 _a2tmp = 0 _a2 = 0 _a4 = 0 __ret = __oldtype = 0 tid = 21545 _buffer = {__routine = 0xa6af75dc , __arg = 0xa0df913c, __canceltype = 16769168, __prev = 0x0} pd = 0xa0df8f20 self = result = 0 pd_result = #1 0xa6af7564 in __pthread_join (threadid=, thread_return=) at pthread_join.c:24 No locals. #2 0xa6cfae3c in createRequest (url=0x1 <error: Cannot access memory at address 0x1>, method=HTTP_POST, bodysize=16769168, body=0xa6d12d04 "\344;\002") at ../../../../../../../../../externalsrc/libredfish/src/asyncRaw.c:40 ret = 0xa4000ab0 #3 0xa6cf41a4 in getPayloadBody (payload=0xa0df8f88) at ../../../../../../../../../externalsrc/libredfish/src/payload.c:200 No locals. Backtrace stopped: previous frame identical to this frame (corrupt stack?)

cowboyjeeper avatar Sep 10 '21 19:09 cowboyjeeper

@pboyd04 any thoughts on this?

mraineri avatar Sep 16 '21 17:09 mraineri

I was able to get further info on this, its rare but still happens. I'm not sure how we get to a refCount = 0 before the listener is killed. This is still using 1.3.4

#0 queuePush (q=0xa3f01df0, value=value@entry=0xa491f118) at ../../../../../../../../../externalsrc/libredfish/src/queue.c:81 node = 0x0 #1 0xa6c5bca0 in addWorkItemToQueue (service=0xa3f00a58, wi=0xa491f118) at ../../../../../../../../../externalsrc/libredfish/src/asyncEvent.c:796 No locals. #2 addEventToQueue (service=0xa3f00a58, event=0xa491d588, copy=true) at ../../../../../../../../../externalsrc/libredfish/src/asyncEvent.c:796 ret = wi = 0xa491f118 #3 0xa6c5c098 in listenTCP (data=, data=) at ../../../../../../../../../externalsrc/libredfish/src/asyncEvent.c:667 tmpSock = 30 buffer = "POST /events/IOM/C1 HTTP/1.1\r\nHost: localhost\r\ncontent-type: application/json\r\nAUTHNZ_USER: (null)\r\nUSER_PRIV: (null)\r\nX-Forwarded-For: fde1:53ba:e9a0:de13:d294:66ff:fe7e:345c\r\nX-Forwarded-Host: fe80\r"... eventCount = 1 i = 0 events = 0xa491d588 ufds = {{fd = 33, events = 33, revents = 1}} rv = readCount = func = "listenTCP" #4 0xa6c5c1ec in tcpThread (args=0x227ad50) at ../../../../../../../../../externalsrc/libredfish/src/asyncEvent.c:682 data = 0x227ad50 func = "tcpThread" #5 0xa6a54144 in start_thread (arg=0xa14f9f20) at pthread_create.c:477 ret = pd = 0xa14f9f20 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-1244860341, -1306122125, -1588617440, -1555049552, -1588617440, 338, -1555049554, 0, -1555049552, -1588618660, 0 <repeats 54 times>}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = robust = #6 0xa6bde878 in clone () at ../sysdeps/unix/sysv/linux/arm/clone.S:41 No locals. Backtrace stopped: previous frame identical to this frame (corrupt stack?)

#4 0xa6c5c1ec in tcpThread (args=0x227ad50) at ../../../../../../../../../externalsrc/libredfish/src/asyncEvent.c:682 (gdb) print *data->service $2 = {host = 0x0, queue = 0x0, asyncThread = 2739920672, curl = 0x0, versions = 0x0, flags = 1, sessionToken = 0x0, bearerToken = 0x0, otherAuth = 0x0, pthread_mutex_t = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0, __nusers = 0, {__spins = 0, __list = { __next = 0x0}}}, __size = '\000' <repeats 23 times>, __align = 0}, refCount = 0, selfTerm = false, eventThreadQueue = 0xa3f01df0, eventThread = 2723135264, sseThread = 0, tcpThread = 2706349856, tcpSocket = -1, eventTerm = true, eventRegistrationUri = 0x22a6068 "p\026 \002", zeroMQListener = 0x0, sessionUri = 0x0, freeing = 232}

cowboyjeeper avatar Feb 23 '22 17:02 cowboyjeeper

Things that look important to me in that service

refCount = 0 eventTerm = true // if we are terminating why are we queueing this? sessionToken = 0x0 // we got pretty far in the kill of the

I'd question if this service is even valid anymore. This might be a race?

cowboyjeeper avatar Feb 23 '22 18:02 cowboyjeeper

Yeah I'm pretty sure this is a race condition in the caller. not sure how to fix it in the library.

pboyd04 avatar Mar 29 '23 19:03 pboyd04

Closing; no further info to help solve this at the library level...

mraineri avatar Jun 28 '24 19:06 mraineri