switchio Add a timeout to the example dialplan

One issue I noticed with park-only dialplan is that if for some reason the inbound socket process fail or die, then the parked calls will remained in freeswitch db (show calls) will list out the calls, filling out freeswitch internal db until it max out and freeswitch not accepting any new calls.

Is there something can be done in the dialplan other than just calling park, so that if there's no inbound server to process the call, it will just terminate the channel similar to outbound socket ?

Nov 30 '17 00:11 k4ml

@k4ml yes you can set a timeout using the park_timeout channel variable.

I think that's the way I've normally handled it in the past; if it doesn't work let me know and I'll make a test with a working solution. Actually we should probably document that in the dialplan example stuff.

In general I'd like to ship a production grade example dialplan with switchio. @k4ml if you come up with a good working version we'll gladly take a PR for it.

Nov 30 '17 01:11 goodboy

@k4ml btw have you noticed that the socket dies with switchio in particular? If that is the case we likely have a bug.

Nov 30 '17 01:11 goodboy

@tgoodlet I haven't try switchio yet but this is what I notice when using park-only dialplan + inbound socket in general. Even the socket didn't die, you must be explicitly hanging up the call somewhere in your app.

Nov 30 '17 02:11 k4ml

So with park_timeout set, I guess we also need to unpark the call if we manage to handle it in channel_park, otherwise the call will be terminated half the way. Looking here we have to do that with uuid_transfer but that will bring back the call into the dialplan, which mean we need at least 2 extensions in the dialplan, one to handle new incoming call (put into park) and another one to handle unpark call, otherwise we will get into infinite loop ?

Nov 30 '17 02:11 k4ml

which mean we need at least 2 extensions in the dialplan, one to handle new incoming call (put into park) and another one to handle unpark call, otherwise we will get into infinite loop ?

No I don't believe this is true. Using any other call mgmt cmd should take the session out of the CHANNEL_PARK state/app afaik. In the wiki section you linked to uuid_transfer is just used as the typical command to be used within an XML dialplan that would accomplish this. If you've found this is not true that would be a problem but I'd find it hard to believe as I've never experienced it and the the inbound socket approach is the recommended approach according to the wiki - including the use of a park app.

Actually on top of ^ outbound mode parks calls in the same way before transferring control.

Nov 30 '17 03:11 goodboy

@k4ml if this build passes it should prove my point as there are calls during some of the stress tests in the suite that are kept up longer then 3 seconds (the timeout I added).

Nov 30 '17 03:11 goodboy

Yep just verified it manually as well. As soon as you do anything else with the session the park timeout is cancelled. I'm going to write a formal test to demonstrate this.

Nov 30 '17 04:11 goodboy

@k4ml added a test case in PR #49 which verifies everything I've claimed against the latest FS docker image.

Hope that clears up all your questions.

Nov 30 '17 06:11 goodboy

The park_timeout variable is used to calculate an expiration time to be inside the park loop, not necessarily the park state. I think the problem with relying on that is that as soon as your application ends, you're back to the loop and the expiry check will cause the park loop to end and hangup the channel.

The park loop only ends when:

The channel is hung up
The CF_PARK or CF_CONTROLLED flags are not set anymore
Various error conditions (e.g failed to read I/O from the channel)
Park timeout

I might be missing something, but I don't think that timeout would be cleared by executing applications. However, it won't trigger if you're inside a long-running application (e.g playback, bert, or whatever). It will trigger when that application ends (if it ends after the expiry time). If you execute a never-ending application such as endless_playback or bert, then the timer will appear to never fire.

Nov 30 '17 12:11 moises-silva

Probably the solution you'd be looking for is an activity timeout. E.g, quit if no commands have been received in X amount of time. It'd be basically be the same as the parking timeout, just that it would be reset at the end of executing every command.

For outbound sockets it's different because the outbound socket explicitly clears the CF_CONTROLLED flag to exit the park loop when the socket goes down.

Nov 30 '17 12:11 moises-silva

Note all of this would be to work-around a buggy call control server. Any call control server should restart if it dies (e.g via systemd unit restart) and recover control of the sessions or hang up the old sessions (depending on how much state-keeping you preserve after dying).

Nov 30 '17 12:11 moises-silva

@moises-silva thanks for the in depth analysis! Getting the source details is super helpful.

In my proposed test it seems simply answering or bridging the session prevents the timeout as well. Maybe I should try un-parking the session and see if it times out?

Also the condition in that while statement, particularly switch_channel_ready(channel) - if you dig down it seems to be a macro for switch_channel_test_ready - called with (_channel, TRUE, FALSE) which in turn is handled in switch_channel.c and has a nasty if statement that may break the loop:

if (!channel->hangup_cause && channel->state > CS_ROUTING && channel->state < CS_HANGUP && channel->state != CS_RESET &&
		!switch_channel_test_flag(channel, CF_TRANSFER) && !switch_channel_test_flag(channel, CF_NOT_READY) &&
		!switch_channel_state_change_pending(channel)) {
		ret++;
	}

Particularly !switch_channel_test_flag(channel, CF_NOT_READY) && !switch_channel_state_change_pending(channel) should likely return TRUE if the channel is operated on by some other command no?

Nov 30 '17 15:11 goodboy

Yeah so if I understand this correctly the hangup via park_timeout can only occur if you're still inside that while. This seems to be verified by the value not being read anywhere else in the code base:

 >>> git grep park_timeout
src/mod/endpoints/mod_sofia/sofia.c:                                            switch_channel_set_variable(channel_b, "park_timeout", "2:attended_transfer");
src/mod/endpoints/mod_sofia/sofia.c:                            switch_channel_set_variable(channel, "park_timeout", "600:blind_transfer");
src/mod/endpoints/mod_verto/mod_verto.c:                switch_channel_set_variable(b_tech_pvt->channel, "park_timeout", "2:attended_transfer");
src/switch_core_state_machine.c:                        switch_channel_set_variable(session->channel, "park_timeout", "10:blind_transfer");
src/switch_ivr.c:       if ((to = switch_channel_get_variable(channel, "park_timeout"))) {
src/switch_ivr.c:               switch_channel_set_variable(channel, "park_timeout", NULL);
src/switch_ivr_bridge.c:                switch_channel_set_variable(channel, "park_timeout", "3");

But maybe I'm missing something?

Nov 30 '17 15:11 goodboy

Yeah agreed, I thought switch_channel_ready() was checking just for hangup and other media io checks, but seems a state change also makes it bail. I'm curious now on what happens then after the last application executes.

Nov 30 '17 17:11 moises-silva

@moises-silva In my test, where I didn't hangup after making a playback, the call still got hangup after the playback end with DESTINATION_OUT_OF_ORDER cause, which I think can verify from the park_timeout in the dialplan.

Dec 01 '17 00:12 k4ml

So I tested executing playback twice. With park_timeout, the call got hangup after the first playback.

Dec 01 '17 00:12 k4ml

@k4ml did you answer the call or put it in another state before executing playback? Can you give an example dialplan that you're using.

Dec 01 '17 00:12 goodboy

@tgoodlet The call was answered and I tested with the same dialplan switchio use:-

<include>
<?xml version="1.0" encoding="utf-8"?>
<!-- A context for relinquishing control of all calls to switchio, the inbound ESL client -->
<context name="public">
  <!-- Park call and transfer control to esl -->
  <extension name="switchiopark">
    <condition field="destination_number" expression="^(.*)$">
     <action application="set" data="park_timeout=5:DESTINATION_OUT_OF_ORDER"/>
      <action application="park"/>
    </condition>
  </extension>
</context>
</include>

Btw I'm not using switchio here but my own esl lib. I'll test with switchio later on.

Dec 01 '17 01:12 k4ml

@k4ml no I meant what is your ESL app doing after handling the CHANNEL_PARK event? Do you execute session.playback('blah') right away or do you session.answer() first?

Also if you'd rather get quicker feedback on this join our Riot room to chat.

Dec 01 '17 01:12 goodboy

@tgoodlet I executed session.answer() first. This is the snippet of the code:-

if commands.name == 'playback':
        if not _credits_enough(call_data['nibble_rate']):
            return
        sound_url = commands.args[0]
        session.answer()
        playback = session.playback(sound_url, **call_data)
        if playback.stop():
            session.playback(sound_url, **call_data)

Dec 01 '17 01:12 k4ml

@k4ml hmm I wonder if it matter that you call session.playback() after the answer has completed. As in you wait for the CHANNEL_ANSWER to arrive first - because that's what my test is doing.

I'll try the test I have with the playback like you have.

Dec 01 '17 01:12 goodboy

@tgoodlet You mean I should wait for CHANNEL_EXECUTE_COMPLETE after executing session.answer() before proceed with playback ?

Dec 01 '17 01:12 k4ml

@k4ml maybe I'm not sure. I know in switchio when we do await sess.answer() underneath the hood we wait for the "CHANNEL_ANSWER" event.

Let me try what you're doing before going off on a tangent trying to prove my theory correct heh.

Dec 01 '17 01:12 goodboy

@k4ml ok so I was able to replicate the situation you describe - where after playback the park timeout cause is used to hangup the call although I don't seem to be able to get that behaviour consistently.

I'm going to investigate a little further.

Dec 01 '17 14:12 goodboy

Further progress on this. I found that FS core is exhibiting unreliable uuid_broadcast behaviour and so I've deprecated its usage as part of #52. I now have playback after park working again and it seems that now I'm never receiving a PLAYBACK_STOP event until I manually kill the playback app using uuid_break. Once I do this I do see the same situation as @k4ml where the park_timeout logic activates and the session is torn down via the coded hangup code. Luckily, for now, if uuid_break is never called (eg. using Session.breakmedia() in switchio) then the session stays in the playback app and the park_timeout never activates.

@moises-silva I personally think this is incorrect behaviour and FS core should move this park_timeout logic further down inside switch_ivr_park to the end of the function such that incoming events are processed before a timeout can occur. You think it's worth proposing to the core team? I also think park_timeout should be a timer that is reset for each time the park loop is re-entered.

Dec 03 '17 17:12 goodboy

The behavior I noticed above still similar with this switchio snippet:-

from switchio.apps.routers import Router

router = Router(guards={
    'Call-Direction': 'inbound',
    },
    subscribe=('PLAYBACK_STOP',)
    )

@router.route('(.*)')
async def welcome(sess, match, router):
    """Say hello to inbound calls.
    """
    await sess.answer()  # resumes once call has been fully answered
    sess.log.info("Answered call to {}".format(match.groups(0)))

    sess.playback('media.mp3') # non-blocking
    sess.log.info("Playing welcome message")
    await sess.recv("PLAYBACK_STOP")
    sess.playback('media.mp3') # non-blocking
    sess.log.info("Playing again ...")
    await sess.recv("PLAYBACK_STOP")

    await sess.hangup()  # resumes once call has been fully hungup
    sess.log.info("%s hangup" % sess.uuid)

With park_timeout, the call hangup after the first playback with the coded hangup cause. This is in router_extra_subscribe branch.

Dec 03 '17 22:12 k4ml

@k4ml does the file media.mp3 actually exist on your FS minion? I have seen that if you fail to playback a file the park_timeout will kick in. I will bet that you'll see errors in the FS log and then the teardown due to the the timeout.

Dec 04 '17 02:12 goodboy

@tgoodlet oh, sorry. media.mp3 is just to mask a real file which is accessed via http. But I can verify the media being played and I can hear it and no errors in freeswitch log as well.

Dec 04 '17 04:12 k4ml

@k4ml yeah so looking at the core FS code more I think we'll need to propose a patch to core to make this work the way we want. I'm happy to do this - just not sure when i'll get some time next, hopefully this week.

Dec 04 '17 21:12 goodboy

switchio switchio copied to clipboard

Add a timeout to the example dialplan

switchio
switchio copied to clipboard