docker_open5gs
docker_open5gs copied to clipboard
ims/volte stops working after expiry of timers (4g setup)
Volte calls between phones work fine after starting the containers, but after 1-2 minutes one of the configured expiry timers in kamailio_pcscf.cfg appears to trigger something and at that point volte calls stop working.
my current, very hacky "fix":
modparam("ims_registrar_pcscf", "is_registered_fallback2ip", 1)
modparam("ims_registrar_pcscf", "ignore_reg_state", 1)
modparam("ims_registrar_pcscf", "ignore_contact_rxport_check", 1)
-modparam("ims_registrar_pcscf", "pending_reg_expires", 30)
-modparam("ims_registrar_pcscf", "subscription_expires", 36000)
-modparam("ims_registrar_pcscf", "delete_delay", CONTACT_DELETE_DELAY)
-modparam("ims_usrloc_pcscf", "expires_grace", 120)
+modparam("ims_registrar_pcscf", "pending_reg_expires", 99999)
+modparam("ims_registrar_pcscf", "subscription_expires", 99999)
+modparam("ims_registrar_pcscf", "delete_delay", 5)
+modparam("ims_usrloc_pcscf", "expires_grace", 9999999)
#!ifdef WITH_REGINFO
modparam("ims_registrar_pcscf", "subscribe_to_reginfo", 1)
@@ -428,7 +428,7 @@ modparam("ims_qos", "confirmed_qosrelease_headers", "X-Reason: QoS failed\r\n")
modparam("ims_qos", "authorize_video_flow", 1)
modparam("ims_qos", "af_signaling_ip", RX_AF_SIGNALING_IP)
modparam("ims_qos", "include_rtcp_fd", 1)
-modparam("ims_qos", "rx_auth_expiry", 36000)
+modparam("ims_qos", "rx_auth_expiry", 999999999)
modparam("ims_qos", "recv_mode", 1)
modparam("ims_qos", "dialog_direction", RX_IMS_REG_DIALOG_DIRECTION)
#!endif
Those diff does not make sense. Please post a pcap of the issue you are facing and I can take a look.
Which parts do you need? It appears to be a pcscf issue, only half of the deregistration happens, the other half is perpertually stuck in a pending state, while the "valid" contacts on the scscf side keep piling up with the expiry supplied by the ue, which is 600000 seconds.
Maybe related: https://github.com/kamailio/kamailio/issues/3570
pending_reg_expires
The above modparam is related to UEs whose IMS registration remain in pending state and not for UEs who register with the IMS successfully i.e. 200 OK for SIP REGISTER is received.
subscription_expires
This is when the subscription of the UE expires at P-CSCF. 36000 equates to 10hrs
delete_delay
This is the time after which IPSec connection is removed once the UE de-registers from the network
so I am not sure its a P-CSCF issue
Which parts do you need?
I suggest taking the pcap on the 'any' interface without any filters. The scenario should be as follows
- Start the trace
- Register two UEs
- Wait for 1-2 mins
- Attempt a call
- Stop the trace and post it here
You need to remove the ".zip" file suffix, zip was way too large to attach, so i had to use zstd. broken_after_3min.pcapng.zst.zip
The packets in the pcap file didnt feel to be in right order I saw diameter UAR request before even receiving SIP REGISTER. Can you please ensure that time stamps on the machine running EPC + IMS and UE are the same? And, UE is set to receive time from the network?
Also, can you tell me which eNB you are using?
Sorry, cant be of much help on this. But all I can confirm is that its not due to those values in modparam you are facing this issue.
Just increasing pending_reg_expires is enough to "fix" the issue, for some reason waiting longer than (pending_reg_expires + expires_grace) appears to be time stuff starts failing. It can also be "fixed" by putting the phones in airplane mode, and turning them on again (unitl it fails again). No idea what is going on with the timestamps tbh.. I usually don't look at all interfaces because I don't want to see everything five times, so I never noticed that ;)
Can you post a pcap of the working call scenario with increased pending reg expire timer?
enb is srsenb, btw.
One "fixed" attempt: pending_9999999_callfix.pcapng.zst.zip
And another try, I can basically watch it fail really fast even before making a call by doing this:
--- a/pcscf/kamailio_pcscf.cfg
+++ b/pcscf/kamailio_pcscf.cfg
@@ -309,7 +309,7 @@ modparam("uac","restore_mode","none")
# ----------------- Settings for RTimer ---------------
# time interval set to 60 seconds
-modparam("rtimer", "timer", "name=NATPING;interval=60;mode=1;")
+modparam("rtimer", "timer", "name=NATPING;interval=2;mode=1;")
modparam("rtimer", "exec", "timer=NATPING;route=NATPING")
#!endif
@@ -398,7 +398,7 @@ modparam("ims_registrar_pcscf", "ignore_contact_rxport_check", 1)
modparam("ims_registrar_pcscf", "pending_reg_expires", 30)
modparam("ims_registrar_pcscf", "subscription_expires", 36000)
modparam("ims_registrar_pcscf", "delete_delay", CONTACT_DELETE_DELAY)
-modparam("ims_usrloc_pcscf", "expires_grace", 120)
+modparam("ims_usrloc_pcscf", "expires_grace", 12)
modparam("ims_usrloc_pcscf", "timer_interval", 2)
The point at which a call will fail is basically as soon as the OPTIONS "nat ping" breaks, which is pretty much after 30+12s after the succesful registration according to the second pcap (below), pcscf log for that point in time:
pcscf | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:41786 via sip:192.168.101.3:41786;transport=tcp...
pcscf | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:40643 via sip:192.168.101.2:40643;transport=tcp...
pcscf | 107(144) ERROR: <script>: request sent to sip:[email protected]:41786 completed with code: 200, Type 1
pcscf | 104(141) ERROR: <script>: request sent to sip:[email protected]:40643 completed with code: 200, Type 1
pcscf | 85(122) INFO: cdp [authstatemachine.c:292]: auth_client_statefull_sm_process(): after callback of event 1
pcscf | 94(131) INFO: cdp [authstatemachine.c:292]: auth_client_statefull_sm_process(): after callback of event 17
pcscf | 94(131) INFO: cdp [authstatemachine.c:463]: auth_client_statefull_sm_process(): state machine: AUTH_EV_RECV_STA about to clean up
pcscf | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:41786 via sip:192.168.101.3:41786;transport=tcp...
pcscf | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:40643 via sip:192.168.101.2:40643;transport=tcp...
pcscf | 107(144) ERROR: <script>: request sent to sip:[email protected]:41786 completed with code: 200, Type 1
pcscf | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:41786 via sip:192.168.101.3:41786;transport=tcp...
pcscf | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:40643 via sip:192.168.101.2:40643;transport=tcp...
pcscf | 107(144) ERROR: <script>: request sent to sip:[email protected]:41786 completed with code: 200, Type 1
pcscf | 85(122) ERROR: <script>: request sent to sip:[email protected]:40643 completed with code: 408, Type 2
pcscf | 85(122) ERROR: <script>: request sent to sip:[email protected]:40643: Fail Counter is 1
pcscf | 85(122) INFO: cdp [authstatemachine.c:292]: auth_client_statefull_sm_process(): after callback of event 1
pcscf | 94(131) INFO: cdp [authstatemachine.c:292]: auth_client_statefull_sm_process(): after callback of event 17
pcscf | 94(131) INFO: cdp [authstatemachine.c:463]: auth_client_statefull_sm_process(): state machine: AUTH_EV_RECV_STA about to clean up
pcscf | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:41786 via sip:192.168.101.3:41786;transport=tcp...
pcscf | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:40643 via sip:192.168.101.2:40643;transport=tcp...
pcscf | 85(122) ERROR: <script>: request sent to sip:[email protected]:40643 completed with code: 408, Type 2
pcscf | 85(122) ERROR: <script>: request sent to sip:[email protected]:40643: Fail Counter is 2
pcscf | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:41786 via sip:192.168.101.3:41786;transport=tcp...
pcscf | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:40643 via sip:192.168.101.2:40643;transport=tcp...
pcscf | 85(122) ERROR: <script>: request sent to sip:[email protected]:41786 completed with code: 408, Type 2
pcscf | 85(122) ERROR: <script>: request sent to sip:[email protected]:41786: Fail Counter is 1
Hey @Hoernchen , thank you very much for details pcaps and logs. I think its a bug in kamailio P-CSCF code. I will see whether I can fix it. In my opinion, wrong Contact of the UE is deleted after pending expires + expire grace.
@Hoernchen Please pull the latest changes from this repo and re-build kamailio images and give it a try (no modifications to expires values is required).
I have been watching and "secretly" using your branch since tuesday, and can confirm that the current repo appears to have fixed the issue and calls just keep working, thanks for your very fast fix!
Just one minor complaint: the new much lower nat ping leads to a lot of "spam" in the docker compose output, it would be great to suppress that in some way unless it fails, because even with just two active phones I kind of mostly just see option messages and have to scroll forever to find anything...
Thanks for confirming the fix.
Just one minor complaint: the new much lower nat ping leads to a lot of "spam" in the docker compose output
Point noted. I will fix the interval.