freeswitch icon indicating copy to clipboard operation
freeswitch copied to clipboard

dead lock in conference

Open zhaojiecll opened this issue 3 years ago • 1 comments

hi guys. we met a problem. we use "conference xxx energy all xxx" and at the same time, someone leaves the conference, Freeswtich get stuck. we use fsctl crash to crash the Freeswtich and get all the bt info. After checking the bt, we found seems there is a deadlock. please check the bt, deadlockBt.txt at first keep eyes on thread 103 and thread 99.

from thread 103 in conference_member::conference_member_itterator, swtich_mutex_lock(conference->member_mutex) in conference_api::conference_api_sub_energy, lock_member(member) conference->member_mutex >> lock_member(member)

from thread 99 in conference_member::confernece_member_del, lock_member(member) in confernece_cdr::conference_cdr_del, swtich_mutex_lock(member->conference->member_mutex) lock_member(member) >> member->conference->member_mutex

now we can see, if thread 103 lock the conference->member_mutex, meanwhile, thread 99 lock_member(member), and then both of the 2 threads will get fail when they want to lock the next lock. so that, they hold lock_member(member) and conference->member_mutex all the time, this confernece will be in stuck status.

and then move to thread 95, after above steps, if we use "conferene list", in conference_api::conference_api_sub_list, swtich_mutex_lock(conference_globals.hash_mutex) in mod_conference::conference_list, swtich_mutex_lock(conference->member_mutex) conference_globals.hash_mutex >> conference->member_mutex (alread locked in thread 103)

and then move to thread 113, in mod_conference::conference_function, swtich_mutex_lock(conference_globals.setup_mutex) in mod_conference::conference_find, swtich_mutex_lock(conference_globals.hash_mutex) conference_globals.setup_mutex >> conference_globals.hash_mutex (locked in thread 95)

now, conference_globals.setup_mutex and conference_globals.hash_mutex are dead. from this time, the conference module will be not avaliable any more.

BTW, the branch we used is 1.10.7-dev git 5e97e5b 2021-05-10 22:54:24z 64bit, and we also added our own modification into this version, just a little change, so the line numbers will not match any of your branch. But I checked the latest codes, the basic logic stays the same without any update.

you can find more bt info in the attachment named deadlockBt.txt. please correct me if I was wrong. and talk to me freely if you need anything from me.

zhaojiecll avatar May 20 '22 11:05 zhaojiecll

We tried to fix it in our way, using trylock instead of lock_member in conference_api_sub_energy. It worked, and so far so good.but we still look forward to your official fix.

zhaojiecll avatar May 25 '22 10:05 zhaojiecll