asterisk icon indicating copy to clipboard operation
asterisk copied to clipboard

[bug]: app_dial: Answer Gosub option passthrough regression

Open InterLinked1 opened this issue 2 years ago • 8 comments

Severity

Major

Versions

21.0.0

Components/Modules

app_dial

Operating Environment

Debian 12

Frequency of Occurrence

Constant

Issue Description

I seem to have encountered a strange bug that affects Asterisk 21.0.0 but not Asterisk 20.0.0. As of yet, I have not determined a specific cause; it's probably nothing to do with app_dial itself but something internal.

In Asterisk 20, the U option works as expected. Answers propagate through channels:

[aa-ans]
exten => dial,1,Dial(Local/s@aa-ans/n)
	same => n,Hangup
exten => s,1,Wait(2)
	same => n,Answer()
	same => n,Hangup()

[answer-sub]
exten => s,1,NoOp(Answered)
	same => n,Return()

[aa-call]
exten => s,1,Dial(Local/s@aa-ans/n,,U(answer-sub,s,1))
	same => n,Hangup()
exten => dial,1,Dial(Local/dial@aa-ans/n,,U(answer-sub,s,1))
	same => n,Hangup()

The "Answered" NoOp is triggered for both of the following:

channel originate Local/s@aa-call application Wait 10
channel originate Local/dial@aa-call application Wait 10

However, in Asterisk 21, the second one does not trigger this.

This has broken "dialplan ABI" since dialplan does not behave the same in 20 and 21.

Relevant log output

No response

Asterisk Issue Guidelines

  • [X] Yes, I have read the Asterisk Issue Guidelines

InterLinked1 avatar Oct 30 '23 16:10 InterLinked1

What am I expecting to see broken out of this:

*CLI> channel originate Local/dial@aa-call application Wait 10 *CLI> -- Called dial@aa-call -- Executing [dial@aa-call:1] Dial("Local/dial@aa-call-00000000;2", "Local/dial@aa-ans/n,,U(answer-sub,s,1)") in new stack -- Called Local/dial@aa-ans/n -- Executing [dial@aa-ans:1] Dial("Local/dial@aa-ans-00000001;2", "Local/s@aa-ans/n") in new stack -- Called Local/s@aa-ans/n -- Executing [s@aa-ans:1] Wait("Local/s@aa-ans-00000002;2", "2") in new stack -- Executing [s@aa-ans:2] Answer("Local/s@aa-ans-00000002;2", "") in new stack -- Local/s@aa-ans-00000002;1 answered Local/dial@aa-ans-00000001;2 -- Local/dial@aa-ans-00000001;1 answered Local/dial@aa-call-00000000;2 -- Local/dial@aa-ans-00000001;1 Internal Gosub(answer-sub,s,1(s,1)) start -- Executing [s@answer-sub:1] NoOp("Local/dial@aa-ans-00000001;1", "Answered") in new stack -- Executing [s@answer-sub:2] Return("Local/dial@aa-ans-00000001;1", "") in new stack == Spawn extension (aa-ans, , 1) exited non-zero on 'Local/dial@aa-ans-00000001;1' -- Local/dial@aa-ans-00000001;1 Internal Gosub(answer-sub,s,1(s,1)) complete GOSUB_RETVAL= -- Channel Local/s@aa-ans-00000002;1 joined 'simple_bridge' basic-bridge -- Channel Local/dial@aa-ans-00000001;2 joined 'simple_bridge' basic-bridge -- Local/dial@aa-call-00000000;1 answered > Launching Wait(10) on Local/dial@aa-call-00000000;1 -- Channel Local/dial@aa-ans-00000001;1 joined 'simple_bridge' basic-bridge -- Channel Local/dial@aa-call-00000000;2 joined 'simple_bridge' basic-bridge -- Executing [s@aa-ans:3] Hangup("Local/s@aa-ans-00000002;2", "") in new stack == Spawn extension (aa-ans, s, 3) exited non-zero on 'Local/s@aa-ans-00000002;2' -- Channel Local/s@aa-ans-00000002;1 left 'simple_bridge' basic-bridge -- Channel Local/dial@aa-ans-00000001;2 left 'simple_bridge' basic-bridge == Spawn extension (aa-ans, dial, 1) exited non-zero on 'Local/dial@aa-ans-00000001;2' -- Channel Local/dial@aa-ans-00000001;1 left 'simple_bridge' basic-bridge -- Channel Local/dial@aa-call-00000000;2 left 'simple_bridge' basic-bridge == Spawn extension (aa-call, dial, 1) exited non-zero on 'Local/dial@aa-call-00000000;2'

(Run on 21.0.0 with provided dialplan)

jcolp avatar Oct 30 '23 16:10 jcolp

What am I expecting to see broken out of this:

*CLI> channel originate Local/dial@aa-call application Wait 10 *CLI> -- Called dial@aa-call -- Executing [dial@aa-call:1] Dial("Local/dial@aa-call-00000000;2", "Local/dial@aa-ans/n,,U(answer-sub,s,1)") in new stack -- Called Local/dial@aa-ans/n -- Executing [dial@aa-ans:1] Dial("Local/dial@aa-ans-00000001;2", "Local/s@aa-ans/n") in new stack -- Called Local/s@aa-ans/n -- Executing [s@aa-ans:1] Wait("Local/s@aa-ans-00000002;2", "2") in new stack -- Executing [s@aa-ans:2] Answer("Local/s@aa-ans-00000002;2", "") in new stack -- Local/s@aa-ans-00000002;1 answered Local/dial@aa-ans-00000001;2 -- Local/dial@aa-ans-00000001;1 answered Local/dial@aa-call-00000000;2 -- Local/dial@aa-ans-00000001;1 Internal Gosub(answer-sub,s,1(s,1)) start -- Executing [s@answer-sub:1] NoOp("Local/dial@aa-ans-00000001;1", "Answered") in new stack -- Executing [s@answer-sub:2] Return("Local/dial@aa-ans-00000001;1", "") in new stack == Spawn extension (aa-ans, , 1) exited non-zero on 'Local/dial@aa-ans-00000001;1' -- Local/dial@aa-ans-00000001;1 Internal Gosub(answer-sub,s,1(s,1)) complete GOSUB_RETVAL= -- Channel Local/s@aa-ans-00000002;1 joined 'simple_bridge' basic-bridge -- Channel Local/dial@aa-ans-00000001;2 joined 'simple_bridge' basic-bridge -- Local/dial@aa-call-00000000;1 answered

Launching Wait(10) on Local/dial@aa-call-00000000;1 -- Channel Local/dial@aa-ans-00000001;1 joined 'simple_bridge' basic-bridge -- Channel Local/dial@aa-call-00000000;2 joined 'simple_bridge' basic-bridge -- Executing [s@aa-ans:3] Hangup("Local/s@aa-ans-00000002;2", "") in new stack == Spawn extension (aa-ans, s, 3) exited non-zero on 'Local/s@aa-ans-00000002;2' -- Channel Local/s@aa-ans-00000002;1 left 'simple_bridge' basic-bridge -- Channel Local/dial@aa-ans-00000001;2 left 'simple_bridge' basic-bridge == Spawn extension (aa-ans, dial, 1) exited non-zero on 'Local/dial@aa-ans-00000001;2' -- Channel Local/dial@aa-ans-00000001;1 left 'simple_bridge' basic-bridge -- Channel Local/dial@aa-call-00000000;2 left 'simple_bridge' basic-bridge == Spawn extension (aa-call, dial, 1) exited non-zero on 'Local/dial@aa-call-00000000;2'

(Run on 21.0.0 with provided dialplan)

I'd be expecting to see this in all cases:

NoOp("Local/dial@aa-ans-00000001;1", "Answered") in new stack

Looks like it works for you.

I just installed a vanilla 21.0.0 on a different machine (but with custom config files) to test and encountered different behavior, strangely:

[2023-10-30 16:44:44.046]     -- Called s@aa-call
[2023-10-30 16:44:44.047]     -- Executing [s@aa-call:1] Dial("Local/s@aa-call-00000000;2", "Local/s@aa-ans/n,,U(answer-sub,s,1)") in new stack
[2023-10-30 16:44:44.047]     -- Called Local/s@aa-ans/n
[2023-10-30 16:44:44.048]     -- Executing [s@aa-ans:1] Wait("Local/s@aa-ans-00000001;2", "2") in new stack
[2023-10-30 16:44:46.050]     -- Executing [s@aa-ans:2] Answer("Local/s@aa-ans-00000001;2", "") in new stack
[2023-10-30 16:44:46.051]     -- Local/s@aa-ans-00000001;1 answered Local/s@aa-call-00000000;2
[2023-10-30 16:44:46.051]     -- Local/s@aa-ans-00000001;1 Internal Gosub(answer-sub,s,1(s,1)) start
[2023-10-30 16:44:46.051]     -- Executing [s@answer-sub:1] NoOp("Local/s@aa-ans-00000001;1", "Answered") in new stack
[2023-10-30 16:44:46.051]     -- Executing [s@answer-sub:2] Return("Local/s@aa-ans-00000001;1", "") in new stack
[2023-10-30 16:44:46.051]   == Spawn extension (aa-ans, , 1) exited non-zero on 'Local/s@aa-ans-00000001;1'
[2023-10-30 16:44:46.051]     -- Local/s@aa-ans-00000001;1 Internal Gosub(answer-sub,s,1(s,1)) complete GOSUB_RETVAL=
[2023-10-30 16:44:46.052]     -- Channel Local/s@aa-ans-00000001;1 joined 'simple_bridge' basic-bridge <5a30bfb4-bf3e-4d3d-9d67-84547314788a>
[2023-10-30 16:44:46.052]     -- Channel Local/s@aa-call-00000000;2 joined 'simple_bridge' basic-bridge <5a30bfb4-bf3e-4d3d-9d67-84547314788a>
[2023-10-30 16:44:46.552]     -- Executing [s@aa-ans:3] Hangup("Local/s@aa-ans-00000001;2", "") in new stack
[2023-10-30 16:44:46.552]   == Spawn extension (aa-ans, s, 3) exited non-zero on 'Local/s@aa-ans-00000001;2'
[2023-10-30 16:44:46.553]     -- Channel Local/s@aa-ans-00000001;1 left 'simple_bridge' basic-bridge <5a30bfb4-bf3e-4d3d-9d67-84547314788a>
[2023-10-30 16:44:46.553]     -- Channel Local/s@aa-call-00000000;2 left 'simple_bridge' basic-bridge <5a30bfb4-bf3e-4d3d-9d67-84547314788a>
[2023-10-30 16:44:46.553]   == Spawn extension (aa-call, s, 1) exited non-zero on 'Local/s@aa-call-00000000;2'
voip*CLI> channel originate Local/dial@aa-call application Wait 10
[2023-10-30 16:44:50.302]     -- Called dial@aa-call
[2023-10-30 16:44:50.303]     -- Executing [dial@aa-call:1] Dial("Local/dial@aa-call-00000002;2", "Local/dial@aa-ans/n,,U(answer-sub,s,1)") in new stack
[2023-10-30 16:44:50.303]     -- Called Local/dial@aa-ans/n
[2023-10-30 16:44:50.303]     -- Executing [dial@aa-ans:1] Dial("Local/dial@aa-ans-00000003;2", "Local/s@aa-ans/n") in new stack
[2023-10-30 16:44:50.304]     -- Called Local/s@aa-ans/n
[2023-10-30 16:44:50.304]     -- Executing [s@aa-ans:1] Wait("Local/s@aa-ans-00000004;2", "2") in new stack
[2023-10-30 16:44:52.306]     -- Executing [s@aa-ans:2] Answer("Local/s@aa-ans-00000004;2", "") in new stack
[2023-10-30 16:44:52.307]     -- Local/s@aa-ans-00000004;1 answered Local/dial@aa-ans-00000003;2
[2023-10-30 16:44:52.307]     -- Channel Local/s@aa-ans-00000004;1 joined 'simple_bridge' basic-bridge <dde54b0e-e9aa-4c73-88d4-1770c451eac7>
[2023-10-30 16:44:52.307]     -- Channel Local/dial@aa-ans-00000003;2 joined 'simple_bridge' basic-bridge <dde54b0e-e9aa-4c73-88d4-1770c451eac7>
[2023-10-30 16:44:52.308]     -- Local/dial@aa-ans-00000003;1 requested media update control 26, passing it to Local/dial@aa-call-00000002;2
[2023-10-30 16:44:52.308]     -- Local/dial@aa-ans-00000003;1 stopped sounds
[2023-10-30 16:44:52.808]     -- Executing [s@aa-ans:3] Hangup("Local/s@aa-ans-00000004;2", "") in new stack
[2023-10-30 16:44:52.808]   == Spawn extension (aa-ans, s, 3) exited non-zero on 'Local/s@aa-ans-00000004;2'
[2023-10-30 16:44:52.809]     -- Channel Local/s@aa-ans-00000004;1 left 'simple_bridge' basic-bridge <dde54b0e-e9aa-4c73-88d4-1770c451eac7>
[2023-10-30 16:44:52.809]     -- Channel Local/dial@aa-ans-00000003;2 left 'simple_bridge' basic-bridge <dde54b0e-e9aa-4c73-88d4-1770c451eac7>
[2023-10-30 16:44:52.809]   == Spawn extension (aa-ans, dial, 1) exited non-zero on 'Local/dial@aa-ans-00000003;2'
[2023-10-30 16:44:52.809]     -- Local/dial@aa-ans-00000003;1 requested media update control 26, passing it to Local/dial@aa-call-00000002;2
[2023-10-30 16:44:52.809]     -- No one is available to answer at this time (1:0/0/0)
[2023-10-30 16:44:52.809]     -- Executing [dial@aa-call:2] Hangup("Local/dial@aa-call-00000002;2", "") in new stack

Might be caused by some configuration option, possibly... I'll continue investigating.

InterLinked1 avatar Oct 30 '23 16:10 InterLinked1

It's probably a race condition with the closeness of the Answer() and Hangup() in the provided example and thus different across different systems, and even different versions depending on changes made in places.

jcolp avatar Oct 30 '23 17:10 jcolp

It's probably a race condition with the closeness of the Answer() and Hangup() in the provided example and thus different across different systems, and even different versions depending on changes made in places.

Can't be, there's a 2 second difference, that would only make sense if it was under a few hundred milliseconds. And in other tests, the answer stuff never happens, no matter how long the channel stays up.

The test I had just done was on an old server with a modified 20.0.0, working, and an unmodified 21.0.0, not working (but with configuration intact).

I tried earlier with a frame trace as well, I think somehow the answer is getting dropped when it should be propagated downstream. As my next test I'll start with sample configs and slowly replace them back with custom ones.

InterLinked1 avatar Oct 30 '23 17:10 InterLinked1

It's probably a race condition with the closeness of the Answer() and Hangup() in the provided example and thus different across different systems, and even different versions depending on changes made in places.

Upon further investigation, this seems to happen only when the following global variable is defined:

BRIDGE_NOANSWER=1

I traced it down to commenting that out in [globals] when booting Asterisk.

I've had this configured for a while now, as it's necessary to make Asterisk behave less stupidly in certain scenarios.

Obviously, it sounds clearly related, but that never produced this effect in Asterisk 20, only in Asterisk 21. Has some internal bridging thing changed in the past year that might be related? I was looking out for stuff like this but might have missed something.

InterLinked1 avatar Oct 30 '23 20:10 InterLinked1

Nothing immediately comes to mind. You would need to start looking at commits.

jcolp avatar Oct 30 '23 20:10 jcolp

git bisect

seanbright avatar Oct 30 '23 21:10 seanbright

git bisect

Thanks, just from looking, I've already located it - ironically, it's actually one of my own commits: 7cbf779f139d2da860c56adb7987a61d81693c4d

I would've thought this would've been in Asterisk 20.0.0 from the merge date, but apparently not. I cherry picked several changes in over the last year to avoid reinstalling, but I guess not that one. I forgot I even added that, I thought that was there all along.

The previous commit is, I think doing what it's supposed to, so this is partly on me for having it set as a global variable. I think that was a lazy way of trying to fix problems everywhere in a single keystroke. The idea is if manually joining a bridge, then one should be able to inhibit the auto answer functionality, but for bridges that are just connecting channels together, like with Dial, it certainly makes sense to relay the answer. I'll have to see if there's an elegant way to accommodate that and a good heuristic to determine if this is the case. If that's not easily doable, I should update the documentation with a warning not to set this as a global variable. Either way, you can assign this to me for now.

InterLinked1 avatar Oct 30 '23 21:10 InterLinked1