freeswitch icon indicating copy to clipboard operation
freeswitch copied to clipboard

Freeswitch leaves stale calls

Open greenbea opened this issue 1 year ago • 22 comments

Describe the bug Freeswitch leaves a lot of stale calls in memory and in db. In the past, we experienced this seldomly, and it was only stale calls in the database, but not in memory. However, the same day we started to use the latest freeswitch version 1.10.11, we saw a lot of stale calls, and uuid_exists returned true on those calls, indicating the calls were still in memory. So this is more of a serious bug.

I've seen this very same issue reported on the slack channel, that after an update to the latest freeswitch there's an increase in stale calls.

Package version or git hash

  • Version 1.10.11

greenbea avatar Jan 26 '24 05:01 greenbea

I took a core dump and found deadlocks causing the calls to get stuck. The gdb output is just from 1 call, but all the others look very alike.

The gdb output is from the following commands.

thread 78
bt
frame 3
p *mutex
thread 79
bt
frame 3
p *mutex

greenbea avatar Jan 26 '24 06:01 greenbea

It appears this is the same issue as https://github.com/signalwire/freeswitch/issues/2290

greenbea avatar Jan 26 '24 06:01 greenbea

I have the same issue with 1.10.11, I will try with 1.10.9 and 1.10.10 and report back with findings.

technophreak avatar Feb 02 '24 14:02 technophreak

Hi everyone, Is this issue linked to audio transcoding? We don't have any transcoding in our side so I would like to know it before upgrading to the latest version. Thanks in advance. Best Regards

wesmu avatar Feb 02 '24 15:02 wesmu

Hello, which OS version in your case ? Since we upgraded to Debian 12 with FreeSWITCH in v1.10.11 we have same issue. A lot of call not hung up by FS.

After checking in detail a lot of packets (200OK, BYE) seem to be randomly ignored by FS. So after few days we have a lot of "ghost" call in memory of fs_cli

Same issue for you ?

image

Regards,

bferreirq avatar Feb 05 '24 14:02 bferreirq

Hello, which OS version in your case ? Since we upgraded to Debian 12 with FreeSWITCH in v1.10.11 we have same issue. A lot of call not hung up by FS.

After checking in detail a lot of packets (200OK, BYE) seem to be randomly ignored by FS. So after few days we have a lot of "ghost" call in memory of fs_cli

Same issue for you ?

image

Regards,

Take a look on CONTACT header sent in the INVITE, it seems the other end is sending all the answers to one address is not where the FS is listening on (assuming FS sent the second INVITE...)

wesmu avatar Feb 05 '24 16:02 wesmu

Yes, everything is good on contact side, it's totally random over several thousand calls per day

bferreirq avatar Feb 05 '24 16:02 bferreirq

I am using Debian 12. I have not noticed any issue with 200 OK. I will take a closer look and report if I find anything on that regard.

technophreak avatar Feb 05 '24 18:02 technophreak

Hi everyone, Is this issue linked to audio transcoding? We don't have any transcoding in our side so I would like to know it before upgrading to the latest version. Thanks in advance. Best Regards

The issue isn't related to transcoding. See issue https://github.com/signalwire/freeswitch/issues/2290 for the details of this deadlock.

greenbea avatar Feb 05 '24 18:02 greenbea

@technophreak I tried to reproduce in different ways with a SIPp in all directions and I not reproduce.

But randomly it happens and after few days we have like 100 fake active calls on freeswitch (fs_cli -x show calls) due to this problem (when we check 200OK, BYE is ignored)

Of course, Contact, Via and other fields seems good.

For me happening after upgrade to Deb12 and 1.10.10. I’ve upgrade to 1.10.11 in order to test but its not better.

If i can help tell me, thanks a lot.

bferreirq avatar Feb 05 '24 18:02 bferreirq

@bferreirq Yeah, in my case those stale calls are not even properly killed with uuid_kill, there are remnants. I suspect it is not directly related, perhaps your OK issue is just a symptom for the same issue.

technophreak avatar Feb 05 '24 21:02 technophreak

You cannot kill those calls because they're mutex locked. The only way to get rid of them is to restart freeswitch.

greenbea avatar Feb 05 '24 21:02 greenbea

Yes we already schedule restart of freeswitch to kill these calls.

bferreirq avatar Feb 05 '24 21:02 bferreirq

@bferreirq The issue I have is that those have remants (still appears in show calls) after they get killed and I only restart when there are no more calls in progress.

Although rare, we do have very long calls, so I can't just assume that a 3 hour call is a ghost call and I have no indication if this call is still really legitimately connected.

This is a very annoying issue.

technophreak avatar Feb 05 '24 21:02 technophreak

@greenbea Thanks for that info.

Have you found a way to know (via uuid_dump for example) if they are still really connected ?

technophreak avatar Feb 05 '24 21:02 technophreak

In my specific scenario, the issue only occurred when transcoding inbound calls. It does not mean it only affects inbound calls, it's just that with my current configuration only inbound is being affected.

I've completed my tests and I can confirm for sure that v1.10.8 and v1.10.9 does not have this issue.

I will go ahead and test version 1.10.10 now and report back.

technophreak avatar Feb 13 '24 19:02 technophreak

In my specific scenario, the issue only occurred when transcoding inbound calls. It does not mean it only affects inbound calls, it's just that with my current configuration only inbound is being affected.

I've completed my tests and I can confirm for sure that v1.10.8 and v1.10.9 does not have this issue.

I will go ahead and test version 1.10.10 now and report back.

could you please share your sip scenario ?

bferreirq avatar Feb 14 '24 09:02 bferreirq

@bferreirq The calls that seems to trigger most often this issue are as follow:

  • [A LEG] Call comes in sip_profile (operator) with G729,PCMU as codec offered Porifile set to greedy, disable transcoding on, late-negotiation on
  • [B LEG] Call is bridged to user in sip_profile (customer) with PCMU as codec accepted Profile set to greedy, disable transcoding off, late nego on

--

I have however seen many calls that are PCMU all the way get stuck as well.

technophreak avatar Feb 15 '24 18:02 technophreak

I can confirm I am also getting stale/stuck calls with v1.10.10

As far as I am concenred, here is my diagnostic:

  • v1.10.8 : Not affected
  • v1.10.9 : Not affected
  • v1.10.10 : Affected
  • v1.10.11 : Affected

--

To add insult to the injury. I cannot even use v1.10.9 because it seems to have issues with rxfax for one of my carrier.

technophreak avatar Feb 15 '24 18:02 technophreak

I can confirm I am also getting stale/stuck calls with v1.10.10

As far as I am concenred, here is my diagnostic:

  • v1.10.8 : Not affected
  • v1.10.9 : Not affected
  • v1.10.10 : Affected
  • v1.10.11 : Affected

--

To add insult to the injury. I cannot even use v1.10.9 because it seems to have issues with rxfax for one of my carrier.

I have the same problem in 1.10.10-release. Although the actual call has ended, the channel inside Freeswitch has not been hung up, so the CDR has not been updated. image

phamhieptel4vn avatar Apr 04 '24 05:04 phamhieptel4vn

We had the same issue, you should check the following patch. https://github.com/signalwire/freeswitch/pull/2300

shaunjstokes avatar Apr 04 '24 06:04 shaunjstokes

We had the same issue, you should check the following patch. #2300

Could you tell me if this PR has resolved the issue?

phamhieptel4vn avatar Apr 04 '24 06:04 phamhieptel4vn