freeswitch
freeswitch copied to clipboard
Freeswitch leaves stale calls
Describe the bug
Freeswitch leaves a lot of stale calls in memory and in db. In the past, we experienced this seldomly, and it was only stale calls in the database, but not in memory. However, the same day we started to use the latest freeswitch version 1.10.11, we saw a lot of stale calls, and uuid_exists
returned true on those calls, indicating the calls were still in memory. So this is more of a serious bug.
I've seen this very same issue reported on the slack channel, that after an update to the latest freeswitch there's an increase in stale calls.
Package version or git hash
- Version 1.10.11
I took a core dump and found deadlocks causing the calls to get stuck. The gdb output is just from 1 call, but all the others look very alike.
The gdb output is from the following commands.
thread 78
bt
frame 3
p *mutex
thread 79
bt
frame 3
p *mutex
It appears this is the same issue as https://github.com/signalwire/freeswitch/issues/2290
I have the same issue with 1.10.11, I will try with 1.10.9 and 1.10.10 and report back with findings.
Hi everyone, Is this issue linked to audio transcoding? We don't have any transcoding in our side so I would like to know it before upgrading to the latest version. Thanks in advance. Best Regards
Hello, which OS version in your case ? Since we upgraded to Debian 12 with FreeSWITCH in v1.10.11 we have same issue. A lot of call not hung up by FS.
After checking in detail a lot of packets (200OK, BYE) seem to be randomly ignored by FS. So after few days we have a lot of "ghost" call in memory of fs_cli
Same issue for you ?
Regards,
Hello, which OS version in your case ? Since we upgraded to Debian 12 with FreeSWITCH in v1.10.11 we have same issue. A lot of call not hung up by FS.
After checking in detail a lot of packets (200OK, BYE) seem to be randomly ignored by FS. So after few days we have a lot of "ghost" call in memory of fs_cli
Same issue for you ?
Regards,
Take a look on CONTACT header sent in the INVITE, it seems the other end is sending all the answers to one address is not where the FS is listening on (assuming FS sent the second INVITE...)
Yes, everything is good on contact side, it's totally random over several thousand calls per day
I am using Debian 12. I have not noticed any issue with 200 OK. I will take a closer look and report if I find anything on that regard.
Hi everyone, Is this issue linked to audio transcoding? We don't have any transcoding in our side so I would like to know it before upgrading to the latest version. Thanks in advance. Best Regards
The issue isn't related to transcoding. See issue https://github.com/signalwire/freeswitch/issues/2290 for the details of this deadlock.
@technophreak I tried to reproduce in different ways with a SIPp in all directions and I not reproduce.
But randomly it happens and after few days we have like 100 fake active calls on freeswitch (fs_cli -x show calls) due to this problem (when we check 200OK, BYE is ignored)
Of course, Contact, Via and other fields seems good.
For me happening after upgrade to Deb12 and 1.10.10. I’ve upgrade to 1.10.11 in order to test but its not better.
If i can help tell me, thanks a lot.
@bferreirq Yeah, in my case those stale calls are not even properly killed with uuid_kill, there are remnants. I suspect it is not directly related, perhaps your OK issue is just a symptom for the same issue.
You cannot kill those calls because they're mutex locked. The only way to get rid of them is to restart freeswitch.
Yes we already schedule restart of freeswitch to kill these calls.
@bferreirq The issue I have is that those have remants (still appears in show calls
) after they get killed and I only restart when there are no more calls in progress.
Although rare, we do have very long calls, so I can't just assume that a 3 hour call is a ghost call and I have no indication if this call is still really legitimately connected.
This is a very annoying issue.
@greenbea Thanks for that info.
Have you found a way to know (via uuid_dump for example) if they are still really connected ?
In my specific scenario, the issue only occurred when transcoding inbound calls. It does not mean it only affects inbound calls, it's just that with my current configuration only inbound is being affected.
I've completed my tests and I can confirm for sure that v1.10.8 and v1.10.9 does not have this issue.
I will go ahead and test version 1.10.10 now and report back.
In my specific scenario, the issue only occurred when transcoding inbound calls. It does not mean it only affects inbound calls, it's just that with my current configuration only inbound is being affected.
I've completed my tests and I can confirm for sure that v1.10.8 and v1.10.9 does not have this issue.
I will go ahead and test version 1.10.10 now and report back.
could you please share your sip scenario ?
@bferreirq The calls that seems to trigger most often this issue are as follow:
- [A LEG] Call comes in sip_profile (operator) with G729,PCMU as codec offered Porifile set to greedy, disable transcoding on, late-negotiation on
- [B LEG] Call is bridged to user in sip_profile (customer) with PCMU as codec accepted Profile set to greedy, disable transcoding off, late nego on
--
I have however seen many calls that are PCMU all the way get stuck as well.
I can confirm I am also getting stale/stuck calls with v1.10.10
As far as I am concenred, here is my diagnostic:
- v1.10.8 : Not affected
- v1.10.9 : Not affected
- v1.10.10 : Affected
- v1.10.11 : Affected
--
To add insult to the injury. I cannot even use v1.10.9 because it seems to have issues with rxfax
for one of my carrier.
I can confirm I am also getting stale/stuck calls with v1.10.10
As far as I am concenred, here is my diagnostic:
- v1.10.8 : Not affected
- v1.10.9 : Not affected
- v1.10.10 : Affected
- v1.10.11 : Affected
--
To add insult to the injury. I cannot even use v1.10.9 because it seems to have issues with
rxfax
for one of my carrier.
I have the same problem in 1.10.10-release. Although the actual call has ended, the channel inside Freeswitch has not been hung up, so the CDR has not been updated.
We had the same issue, you should check the following patch. https://github.com/signalwire/freeswitch/pull/2300
We had the same issue, you should check the following patch. #2300
Could you tell me if this PR has resolved the issue?