sngrep icon indicating copy to clipboard operation
sngrep copied to clipboard

SIP messages out-of-order in Call Flow

Open jcabezas61 opened this issue 3 years ago • 6 comments

Hi, Very often (but not always) when running sngrep in my opensips server: The ordering of SIP messages in Call Flow window is very messed. I'm not sure but it seems that all the messages are displayed though out-of-order.

sngrep version = 1.4.6 OS = Ubuntu 20.04.2 LTS

sngrep-out-of-order-blurred

Thanks, Julio

jcabezas61 avatar Jul 16 '22 22:07 jcabezas61

Hi Julio!

Looks like time sort function is not working as expected because there are negative time diffs in the left column.

Can this be reproduced with an offline pcap file? Could send me one to debug the issue?

Thanks!

Kaian avatar Jul 18 '22 07:07 Kaian

Hi,

Here is a "bad call" as seen on screen and in the exported pcap. for doing the export I selected only the desired bad call but pcap includes many other SIP messages (that were flowing through the server during the capture). You can select in Wireshark the relevant call messages using a filter like "sip.Call-ID~MWQ2"

I could notice some things about the problem during my usage of sngrep:

1- In my experience the same installed sngrep in the same server: along one day works fine(message order correct) for some minutes/hours and then starts to mess things for some more minutes/hours and then again works fine. It forms a sucession of cycles of well- and ill- functioning.

2- I could not yet understand the duration of those cycles or what triggers/explains the change from well- to ill- and vice-versa.

3- Besides the messed order of the messages in the displayed call flow it is frequent that when doing the capture I can see that some messages take some randow seconds to appear in the flow, some appear after other later messages already rendered on screen

4- My procedure to produce the .pcap is selecting just the one call that I want to export. It seems that a "problematic call" goes to pcap with several other messages not pertaining to the selected call. On the other hand a "good call" export shows strictly all the messages that are part of the call and no other extra message.

Thanks out-of-order_19-07-22

Link to pcap: https://www.dropbox.com/s/ytyewxwm5rs4yoy/out-of-order_19-07-22.pcap?dl=0.

jcabezas61 avatar Jul 21 '22 02:07 jcabezas61

hi, Any news on this issue? BR

jcabezas61 avatar Aug 09 '22 19:08 jcabezas61

Hi!

Sorry, I've been on hollidays these weeks.

I've tested the attached pcap and message order seems ok in both sngrep 1.4.6 and 1.5.0 Although orrder is ok, the flow shows lots of messages that are probably packet retransmissions.

sngrep does not support TCP retransmissions (#102) packets and they are handled like normal packets so flows may end with a lot of duplicated arrows.

image

image

Maybe the problem is totally related to TCP dialogs?

Regards

Kaian avatar Aug 10 '22 07:08 Kaian

Hi,

You ask me Maybe the problem is totally related to TCP dialogs? and I don't know what to say but the fact is that sometimes, during some time (see below) sngrep handles well the TCP-based dialogs. Btw all my important SIP traffic is TCP.

Let's make a fresh assessment of the problem as we know today:

There are time intervals (periods that can last for minutes or more) when all successive sngrep captures seem flawless

  • all message-flows appear correctly ordered with no missing messages
  • if you save (F2) any single selected call and open the resulting .pcap in wireshark or sngrep you obtain back the original selected flow, wonderful!
  • this recovered flow has all the original messages and no other messages associated to any other Call-ID
  • Let's name this a "healthy capture" ocurring inside a "healthy capturing period"

But there are time intervals (periods that last for minutes/hours) when all sngrep captures are defective

  • message-flows appear out-of-order and messages that we know that existed(because the call succeded) are missing in the flow
  • some difftimes between messages appear negative
  • if you save (F2) just one selected call and open the resulting .pcap in wireshark or sngrep you DO NOT obtain back the original flow!
  • besides the messages of selected call, messages pertaining to other undesired Call-IDs go saved into the .pcap
  • Let's name this a "sick capture" ocurring inside a "sick capturing period"

Also I observed that:

  • you can be inside a "healthy capturing period" and suddenly it becomes a "sick capturing period"
  • as an attempt of solving the problem, if you exit sngrep during a "sick period" and start sngrep again you don't get a "healthy period"
  • I never realized what can be done to avoid a "sick period" or to terminate it.

What could be the next step to understanding? or some new experiment?

BR.

jcabezas61 avatar Aug 12 '22 02:08 jcabezas61

Hi!

My guess is that period with defective captures are caused by networks errors that generates TCP retransmissions. When those retransmissions occur, sngrep handle them as normal packets, causing errors in flows (because it only supports TCP streams that are flawless as we mention earlier).

One approach would be to try to reproduce this with an offline capture. Try capturing at the same time with other raw capture like tcpdump all the traffic and as soon as sngrep fails, stop the capture and check if there have been errors in TCP streams. Configure tcpdump to rotate captures to get a small amount of packets to analize. Opening that capture with sngrep will probably cause the same defective behaviour.

Regards!

Kaian avatar Aug 16 '22 07:08 Kaian