trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

In event processing, assert that mutex is unlocked before freeing mutex.

Open ywkaras opened this issue 2 years ago • 9 comments

The lifetime of mutex used for TS continuations is controlled by the number of referring continuations, it must not be destroyed with TSMutextDestroy().

Also has some very minor cleanup of Ptr.h.

ywkaras avatar Jun 14 '22 22:06 ywkaras

I think this looks good, has it been tested in prod?

zwoop avatar Jul 07 '22 16:07 zwoop

There's problems with it, I'm working on a fix.

ywkaras avatar Jul 07 '22 21:07 ywkaras

I think this looks good, has it been tested in prod?

Yes, now it has.

ywkaras avatar Aug 01 '22 17:08 ywkaras

This fixed a crash Yahoo was seeing in txn_box. txn_box was destroying a continuation with a mutex in the continuation function (so the mutex was locked). Continuations have a (Ptr) reference to their mutex, and the continuation's reference was the only one to its mutex. So the mutex got deleted while it was locked.

@SolidWallOfCode says he will change this code to get the mutex from the continuation, and unlock it, before destroying the continuation. So it boils down to whether we want to idiot proof the mutex destroy, so it will unlock if necessary before destruction.

ywkaras avatar Aug 01 '22 17:08 ywkaras

Example stack dump from crash. The line numbers will be off, it's Yahoo internal TS.

[ 00 ] libc-2.17.so         raise                                                                ( raise.c:55 )
[ 01 ] libc-2.17.so         abort                                                                ( abort.c:90 )
[ 02 ] libtscore.so.9.1.10  ink_abort                                                            ( ink_error.cc:99 )
[ 03 ] libtscore.so.9.1.10  ink_mutex_destroy                                                    ( ink_mutex.cc:72 )
[ 04 ] traffic_server       free                                                                 ( I_Lock.h:595 )
[ 05 ] traffic_server       operator=                                                            ( Ptr.h:241 )
[ 06 ] traffic_server       free_event                                                           ( P_UnixEThread.h:251 )
[ 07 ] traffic_server       EThread::process_queue(Queue<Event, Event::Link_link>*, int*, int*)  ( UnixEThread.cc:196 )
[ 08 ] traffic_server       EThread::execute_regular()                                           ( UnixEThread.cc:259 )
[ 09 ] traffic_server       execute                                                              ( UnixEThread.cc:364 )
[ 10 ] traffic_server       EThread::execute()                                                   ( UnixEThread.cc:342 )
[ 11 ] traffic_server       spawn_thread_internal                                                ( Thread.cc:91 )
[ 12 ] libpthread-2.17.so   start_thread                                                         ( pthread_create.c:307 )

ywkaras avatar Aug 01 '22 23:08 ywkaras

I think these lines of code may potentially cause this crash: https://github.com/SolidWallOfCode/txn_box/blob/dev-0-4/plugin/src/ts_util.cc#L955 https://github.com/SolidWallOfCode/txn_box/blob/dev-0-4/plugin/src/ts_util.cc#L981

The continuation has the only Ptr reference to the mutex. So, when the continuation is destroyed, the reference is cleared, causing the mutex to be destroyed, while it's still locked.

ywkaras avatar Aug 02 '22 00:08 ywkaras

I think these lines of code may potentially cause this crash: https://git.ouryahoo.com/mirror-github/SolidWallOfCode--txn_box/blob/0.4.5/plugin/src/ts_util.cc#L1030 https://git.ouryahoo.com/mirror-github/SolidWallOfCode--txn_box/blob/0.4.5/plugin/src/ts_util.cc#L1003

The continuation has the only Ptr reference to the mutex. So, when the continuation is destroyed, the reference is cleared, causing the mutex to be destroyed, while it's still locked.

I think these links are internal to yahoo only.

shukitchan avatar Aug 02 '22 01:08 shukitchan

I think these links are internal to yahoo only.

Oops, fixed.

ywkaras avatar Aug 02 '22 14:08 ywkaras

[approve ci autest]

ywkaras avatar Sep 06 '22 16:09 ywkaras

This pull request has been automatically marked as stale because it has not had recent activity. Marking it stale to flag it for further consideration by the community.

github-actions[bot] avatar Mar 02 '23 02:03 github-actions[bot]