freeswitch
freeswitch copied to clipboard
Hangup a call when starting recording may cause zombie channels
Describe the bug
Hangup a call when starting recording may cause zombie channels.
The coredump shows that the channel is waiting a condition signal in recording_thread
.
To Reproduce Hard to reproduce
Expected behavior Channel cleared
Package version or git hash
- 1.10.7
Trace logs
backtrace from core file If applicable, provide the full backtrace from the core file.
(gdb) t 54
[Switching to thread 54 (LWP 556260)]
#0 0x00007f63f02460df in pthread_rwlock_wrlock () from /mnt/volume1/debug/usr/lib/x86_64-linux-gnu/libpthread-2.28.so
(gdb) bt full
#0 0x00007f63f02460df in pthread_rwlock_wrlock () from /mnt/volume1/debug/usr/lib/x86_64-linux-gnu/libpthread-2.28.so
No symbol table info available.
#1 0x00007f63f0680a19 in apr_thread_rwlock_wrlock (rwlock=<optimized out>) at locks/unix/thread_rwlock.c:97
stat = <optimized out>
#2 0x00007f63f0309dd5 in switch_thread_rwlock_wrlock (rwlock=<optimized out>) at src/switch_apr.c:253
No locals.
#3 0x00007f63f0348b2c in switch_core_session_write_lock (session=<optimized out>) at src/switch_core_rwlock.c:154
No locals.
#4 0x00007f63f033d73f in switch_core_session_thread (thread=<optimized out>, obj=<optimized out>) at src/switch_core_session.c:1755
session = 0x7f63a453d308
event = 0x7f6362eb2bc0
event_str = 0x0
val = <optimized out>
__func__ = <error reading variable __func__ (Cannot access memory at address 0x7f63f06a1650)>
__PRETTY_FUNCTION__ = <error reading variable __PRETTY_FUNCTION__ (Cannot access memory at address 0x7f63f06a1630)>
#5 0x00007f63f0338e2b in switch_core_session_thread_pool_worker (thread=0x7f638829b1c0, obj=<optimized out>) at src/switch_core_session.c:1800
td = 0x7f63a454d138
pop = 0x7f63a454d138
check_status = <optimized out>
node = <optimized out>
pool = 0x7f638829af68
__func__ = <error reading variable __func__ (Cannot access memory at address 0x7f63f06a16e0)>
#6 0x00007f63f0686b6c in dummy_worker (opaque=0x7f638829b1c0) at threadproc/unix/thread.c:151
thread = 0x7f638829b1c0
#7 0x00007f63f0240fa3 in start_thread () from /mnt/volume1/debug/usr/lib/x86_64-linux-gnu/libpthread-2.28.so
No symbol table info available.
#8 0x00007f63efdf306f in clone () from /mnt/volume1/debug/usr/lib/x86_64-linux-gnu/libc-2.28.so
(gdb) t 55
[Switching to thread 55 (LWP 557436)]
#0 0x00007f63f024700c in pthread_cond_wait@@GLIBC_2.3.2 () from /mnt/volume1/debug/usr/lib/x86_64-linux-gnu/libpthread-2.28.so
(gdb) bt full
#0 0x00007f63f024700c in pthread_cond_wait@@GLIBC_2.3.2 () from /mnt/volume1/debug/usr/lib/x86_64-linux-gnu/libpthread-2.28.so
No symbol table info available.
#1 0x00007f63f03e73e1 in recording_thread (thread=<optimized out>, obj=<optimized out>) at src/switch_ivr_async.c:1329
bug = <optimized out>
session = 0x7f63a453d308
channel = 0x7f63a453b3e0
rh = 0x555782e9df00
bsize = 8192
samples = 0
inuse = <optimized out>
data = 0x7f635c14d028 ""
channels = 2
read_impl = {codec_type = SWITCH_CODEC_TYPE_AUDIO, ianacode = 0 '\000', iananame = 0x0, fmtp = 0x0, samples_per_second = 0, actual_samples_per_second = 0, bits_per_second = 0,
microseconds_per_packet = 0, samples_per_packet = 0, decoded_bytes_per_packet = 0, encoded_bytes_per_packet = 0, number_of_channels = 0 '\000', codec_frames_per_packet = 0,
init = 0x0, encode = 0x0, decode = 0x0, encode_video = 0x0, decode_video = 0x0, codec_control = 0x0, destroy = 0x0, codec_id = 0, impl_id = 0, modname = 0x0, next = 0x0}
__func__ = <error reading variable __func__ (Cannot access memory at address 0x7f63f06bd590)>
__PRETTY_FUNCTION__ = <error reading variable __PRETTY_FUNCTION__ (Cannot access memory at address 0x7f63f06bd570)>
#2 0x00007f63f0686b6c in dummy_worker (opaque=0x555782e9e970) at threadproc/unix/thread.c:151
thread = 0x555782e9e970
#3 0x00007f63f0240fa3 in start_thread () from /mnt/volume1/debug/usr/lib/x86_64-linux-gnu/libpthread-2.28.so
No symbol table info available.
#4 0x00007f63efdf306f in clone () from /mnt/volume1/debug/usr/lib/x86_64-linux-gnu/libc-2.28.so
When the recording media_bug is initing, if hang up the call now, the problem will be reproduced.
To Reproduce
- patch
switch_ivr_async.c
Make sure we have enough time to hang up the call before the media bug init finishes.
--- switch_ivr_async.c 2023-08-24 13:46:50.935017745 +0800
+++ switch_ivr_async.c.patched 2023-08-24 13:55:23.135628464 +0800
@@ -1509,6 +1509,8 @@ static switch_bool_t record_callback(swi
switch_core_media_gen_key_frame(session);
}
}
+ /* Make sure we have enough time to hang up the call before the media bug init finishes */
+ switch_yield(20000000);
}
break;
case SWITCH_ABC_TYPE_TAP_NATIVE_READ:
- make a call
- bgapi uuid_record
- hupall
@hateeyan Thanks for the details. Can you please describe here a little bit more information about this PR and how it changes the behavior.
Hi, @andywolk
https://github.com/signalwire/freeswitch/blob/b74245d48a1f65a05e853f24e973f9b9ff35f8f5/src/switch_core_session.c#L1717-L1747
After hanging up the call, session_thread
removed all media bugs (line 1728).
https://github.com/signalwire/freeswitch/blob/b74245d48a1f65a05e853f24e973f9b9ff35f8f5/src/switch_core_media_bug.c#L955-L1000
Media bug will be appended to session->bugs
after the initialization finished (SWITCH_ABC_TYPE_INIT
).
In this case, the media bug will never be colsed (SWITCH_ABC_TYPE_CLOSE
).
https://github.com/signalwire/freeswitch/blob/b74245d48a1f65a05e853f24e973f9b9ff35f8f5/src/switch_ivr_async.c#L1611-L1621
In record_callback
, if the media bug is not closed, the thread condition will not signal.
https://github.com/signalwire/freeswitch/blob/b74245d48a1f65a05e853f24e973f9b9ff35f8f5/src/switch_ivr_async.c#L1276-L1278
https://github.com/signalwire/freeswitch/blob/b74245d48a1f65a05e853f24e973f9b9ff35f8f5/src/switch_ivr_async.c#L1327-L1333
In recording_thread
, lock the session and wait for the condition.
A deadlock happens which can explain the backtrace.
if we lock the bug_rwlock, it will wait for the initialization to complete.
if we lock the bug_rwlock, it will wait for the initialization to complete.
That's the part I was more interested in. Can you please add a little bit more information about how this changes the behavior.
https://github.com/signalwire/freeswitch/blob/b74245d48a1f65a05e853f24e973f9b9ff35f8f5/src/switch_core_media_bug.c#L1237-L1244
Because it will also lock when removing media bugs. If a media bug is initing, it will block until the bug_rwlock unlocked.
https://github.com/signalwire/freeswitch/blob/b74245d48a1f65a05e853f24e973f9b9ff35f8f5/src/switch_core_media_bug.c#L1237-L1244
Because it will also lock when removing media bugs. If a media bug is initing, it will block until the bug_rwlock unlocked.
Is it possible that switch_core_media_bug_add
is called after switch_core_media_bug_remove_all_function
?
https://github.com/signalwire/freeswitch/blob/b74245d48a1f65a05e853f24e973f9b9ff35f8f5/src/switch_core_media_bug.c#L1237-L1244
Because it will also lock when removing media bugs. If a media bug is initing, it will block until the bug_rwlock unlocked.
Is it possible that
switch_core_media_bug_add
is called afterswitch_core_media_bug_remove_all_function
?
I just tested this case and it doesn't cause problems.
-
switch_core_media_bug_remove_all_function
will be called again inswitch_core_session_destroy
. https://github.com/signalwire/freeswitch/blob/b74245d48a1f65a05e853f24e973f9b9ff35f8f5/src/switch_core_session.c#L1558 -
Before first calling
switch_core_media_bug_remove_all_function
, the channel has been set toCS_DESTROY
, new api calls(uuid_record
) will fail.
Found a related issue #1673
It works fine in our production environment for almost 3 months with the patch #2218.
We're almost a year out with this patch. Are there any status updates? I'm seeing this issue in the wild.