switchio
switchio copied to clipboard
Add a timeout to the example dialplan
One issue I noticed with park-only dialplan is that if for some reason the inbound socket process fail or die, then the parked calls will remained in freeswitch db (show calls
) will list out the calls, filling out freeswitch internal db until it max out and freeswitch not accepting any new calls.
Is there something can be done in the dialplan other than just calling park
, so that if there's no inbound server to process the call, it will just terminate the channel similar to outbound socket ?
@k4ml yes you can set a timeout using the park_timeout
channel variable.
I think that's the way I've normally handled it in the past; if it doesn't work let me know and I'll make a test with a working solution. Actually we should probably document that in the dialplan example stuff.
In general I'd like to ship a production grade example dialplan with switchio
. @k4ml if you come up with a good working version we'll gladly take a PR for it.
@k4ml btw have you noticed that the socket dies with switchio
in particular?
If that is the case we likely have a bug.
@tgoodlet I haven't try switchio yet but this is what I notice when using park-only dialplan + inbound socket in general. Even the socket didn't die, you must be explicitly hanging up the call somewhere in your app.
So with park_timeout set, I guess we also need to unpark the call if we manage to handle it in channel_park, otherwise the call will be terminated half the way. Looking here we have to do that with uuid_transfer
but that will bring back the call into the dialplan, which mean we need at least 2 extensions in the dialplan, one to handle new incoming call (put into park) and another one to handle unpark call, otherwise we will get into infinite loop ?
which mean we need at least 2 extensions in the dialplan, one to handle new incoming call (put into park) and another one to handle unpark call, otherwise we will get into infinite loop ?
No I don't believe this is true. Using any other call mgmt cmd should take the session out of the CHANNEL_PARK
state/app afaik. In the wiki section you linked to uuid_transfer
is just used as the typical command to be used within an XML dialplan that would accomplish this. If you've found this is not true that would be a problem but I'd find it hard to believe as I've never experienced it and the the inbound socket approach is the recommended approach according to the wiki - including the use of a park
app.
Actually on top of ^ outbound mode parks calls in the same way before transferring control.
@k4ml if this build passes it should prove my point as there are calls during some of the stress tests in the suite that are kept up longer then 3 seconds (the timeout I added).
Yep just verified it manually as well. As soon as you do anything else with the session the park timeout is cancelled. I'm going to write a formal test to demonstrate this.
@k4ml added a test case in PR #49 which verifies everything I've claimed against the latest FS docker image.
Hope that clears up all your questions.
The park_timeout variable is used to calculate an expiration time to be inside the park loop, not necessarily the park state. I think the problem with relying on that is that as soon as your application ends, you're back to the loop and the expiry check will cause the park loop to end and hangup the channel.
The park loop only ends when:
- The channel is hung up
- The CF_PARK or CF_CONTROLLED flags are not set anymore
- Various error conditions (e.g failed to read I/O from the channel)
- Park timeout
I might be missing something, but I don't think that timeout would be cleared by executing applications. However, it won't trigger if you're inside a long-running application (e.g playback, bert, or whatever). It will trigger when that application ends (if it ends after the expiry time). If you execute a never-ending application such as endless_playback or bert, then the timer will appear to never fire.
Probably the solution you'd be looking for is an activity timeout. E.g, quit if no commands have been received in X amount of time. It'd be basically be the same as the parking timeout, just that it would be reset at the end of executing every command.
For outbound sockets it's different because the outbound socket explicitly clears the CF_CONTROLLED flag to exit the park loop when the socket goes down.
Note all of this would be to work-around a buggy call control server. Any call control server should restart if it dies (e.g via systemd unit restart) and recover control of the sessions or hang up the old sessions (depending on how much state-keeping you preserve after dying).
@moises-silva thanks for the in depth analysis! Getting the source details is super helpful.
In my proposed test it seems simply answering or bridging the session prevents the timeout as well. Maybe I should try un-parking the session and see if it times out?
Also the condition in that while
statement, particularly switch_channel_ready(channel)
- if you dig down it seems to be a macro for switch_channel_test_ready
- called with (_channel, TRUE, FALSE)
which in turn is handled in switch_channel.c
and has a nasty if
statement that may break the loop:
if (!channel->hangup_cause && channel->state > CS_ROUTING && channel->state < CS_HANGUP && channel->state != CS_RESET &&
!switch_channel_test_flag(channel, CF_TRANSFER) && !switch_channel_test_flag(channel, CF_NOT_READY) &&
!switch_channel_state_change_pending(channel)) {
ret++;
}
Particularly !switch_channel_test_flag(channel, CF_NOT_READY) && !switch_channel_state_change_pending(channel)
should likely return TRUE
if the channel is operated on by some other command no?
Yeah so if I understand this correctly the hangup via park_timeout
can only occur if you're still inside that while
. This seems to be verified by the value not being read anywhere else in the code base:
>>> git grep park_timeout
src/mod/endpoints/mod_sofia/sofia.c: switch_channel_set_variable(channel_b, "park_timeout", "2:attended_transfer");
src/mod/endpoints/mod_sofia/sofia.c: switch_channel_set_variable(channel, "park_timeout", "600:blind_transfer");
src/mod/endpoints/mod_verto/mod_verto.c: switch_channel_set_variable(b_tech_pvt->channel, "park_timeout", "2:attended_transfer");
src/switch_core_state_machine.c: switch_channel_set_variable(session->channel, "park_timeout", "10:blind_transfer");
src/switch_ivr.c: if ((to = switch_channel_get_variable(channel, "park_timeout"))) {
src/switch_ivr.c: switch_channel_set_variable(channel, "park_timeout", NULL);
src/switch_ivr_bridge.c: switch_channel_set_variable(channel, "park_timeout", "3");
But maybe I'm missing something?
Yeah agreed, I thought switch_channel_ready() was checking just for hangup and other media io checks, but seems a state change also makes it bail. I'm curious now on what happens then after the last application executes.
@moises-silva In my test, where I didn't hangup after making a playback, the call still got hangup after the playback end with DESTINATION_OUT_OF_ORDER cause, which I think can verify from the park_timeout in the dialplan.
So I tested executing playback twice. With park_timeout, the call got hangup after the first playback.
@k4ml did you answer the call or put it in another state before executing playback
?
Can you give an example dialplan that you're using.
@tgoodlet The call was answered and I tested with the same dialplan switchio use:-
<include>
<?xml version="1.0" encoding="utf-8"?>
<!-- A context for relinquishing control of all calls to switchio, the inbound ESL client -->
<context name="public">
<!-- Park call and transfer control to esl -->
<extension name="switchiopark">
<condition field="destination_number" expression="^(.*)$">
<action application="set" data="park_timeout=5:DESTINATION_OUT_OF_ORDER"/>
<action application="park"/>
</condition>
</extension>
</context>
</include>
Btw I'm not using switchio here but my own esl lib. I'll test with switchio later on.
@k4ml no I meant what is your ESL app doing after handling the CHANNEL_PARK
event?
Do you execute session.playback('blah')
right away or do you session.answer()
first?
Also if you'd rather get quicker feedback on this join our Riot room to chat.
@tgoodlet I executed session.answer() first. This is the snippet of the code:-
if commands.name == 'playback':
if not _credits_enough(call_data['nibble_rate']):
return
sound_url = commands.args[0]
session.answer()
playback = session.playback(sound_url, **call_data)
if playback.stop():
session.playback(sound_url, **call_data)
@k4ml hmm I wonder if it matter that you call session.playback()
after the answer has completed.
As in you wait for the CHANNEL_ANSWER
to arrive first - because that's what my test is doing.
I'll try the test I have with the playback
like you have.
@tgoodlet You mean I should wait for CHANNEL_EXECUTE_COMPLETE after executing session.answer() before proceed with playback ?
@k4ml maybe I'm not sure. I know in switchio
when we do await sess.answer()
underneath the hood we wait for the "CHANNEL_ANSWER" event.
Let me try what you're doing before going off on a tangent trying to prove my theory correct heh.
@k4ml ok so I was able to replicate the situation you describe - where after playback
the park timeout cause is used to hangup the call although I don't seem to be able to get that behaviour consistently.
I'm going to investigate a little further.
Further progress on this. I found that FS core is exhibiting unreliable uuid_broadcast
behaviour and so I've deprecated its usage as part of #52. I now have playback
after park
working again and it seems that now I'm never receiving a PLAYBACK_STOP
event until I manually kill the playback app using uuid_break
. Once I do this I do see the same situation as @k4ml where the park_timeout
logic activates and the session is torn down via the coded hangup code. Luckily, for now, if uuid_break
is never called (eg. using Session.breakmedia()
in switchio
) then the session stays in the playback
app and the park_timeout
never activates.
@moises-silva I personally think this is incorrect behaviour and FS core should move this park_timeout
logic further down inside switch_ivr_park
to the end of the function such that incoming events are processed before a timeout can occur. You think it's worth proposing to the core team? I also think park_timeout
should be a timer that is reset for each time the park
loop is re-entered.
The behavior I noticed above still similar with this switchio snippet:-
from switchio.apps.routers import Router
router = Router(guards={
'Call-Direction': 'inbound',
},
subscribe=('PLAYBACK_STOP',)
)
@router.route('(.*)')
async def welcome(sess, match, router):
"""Say hello to inbound calls.
"""
await sess.answer() # resumes once call has been fully answered
sess.log.info("Answered call to {}".format(match.groups(0)))
sess.playback('media.mp3') # non-blocking
sess.log.info("Playing welcome message")
await sess.recv("PLAYBACK_STOP")
sess.playback('media.mp3') # non-blocking
sess.log.info("Playing again ...")
await sess.recv("PLAYBACK_STOP")
await sess.hangup() # resumes once call has been fully hungup
sess.log.info("%s hangup" % sess.uuid)
With park_timeout, the call hangup after the first playback with the coded hangup cause. This is in router_extra_subscribe
branch.
@k4ml does the file media.mp3
actually exist on your FS minion? I have seen that if you fail to playback
a file the park_timeout
will kick in. I will bet that you'll see errors in the FS log and then the teardown due to the the timeout.
@tgoodlet oh, sorry. media.mp3 is just to mask a real file which is accessed via http. But I can verify the media being played and I can hear it and no errors in freeswitch log as well.
@k4ml yeah so looking at the core FS code more I think we'll need to propose a patch to core to make this work the way we want. I'm happy to do this - just not sure when i'll get some time next, hopefully this week.