homer icon indicating copy to clipboard operation
homer copied to clipboard

Homer 10 - Call Flow - Sorting Issues/Wrong Order

Open tony1661 opened this issue 1 year ago • 30 comments

I am noticing an issue in Homer 10's Call Flow dashboard. It seems that the Call Flow is not in the correct order which is making troubleshooting difficult.

I have a Freeswitch server (FusionPBX) with the portable helpify agent running heplify -hs homer-stack-ip:9060

If I search by the Call-ID in the SIP headers, I see all SIP messages associated with the call. See below: image

Obviously the INVITE would have occurred before the 407 however that is not what Grafana shows.

Based on the HEP Flow panel, the messages should be sorted oldest to newest.

If I look into each message I see that the INVITE has a later date than the 407.

INVITE Date image

407 Date image

The issue is similar to what is being experienced in Homer 7's web-ui which leads me to think the issue may be related to Heplify. In Homer 7, when searching for a call, the results are also in the wrong order, however when I click on the Session ID to view the ladder, it is in the correct order.

Is it possible that the ladder in Homer 7 is referencing a different timestamp that is also being referenced in Homer 10?

tony1661 avatar Dec 29 '23 20:12 tony1661

Please star this repository to motivate the developers and to get higher priority! :star:

github-actions[bot] avatar Dec 29 '23 20:12 github-actions[bot]

I've tested using captagent instead of heplify and the issue persists in Homer 10.

tony1661 avatar Dec 30 '23 20:12 tony1661

To add to this issue, I took a packet capture of a call that has the SIP messages displayed in the wrong order in Homer 10.

It seems that in the HEP packets, the Unix Timestamp doesn't change. See below: image

The last two messages (BYE and 200 OK) have a different timestamp and in Homer they do indeed appear at the bottom of the ladder but they appear in a different order in the ladder than they do in the packet capture.

It may be worth adding the Timestamp μs to the equation.

image

tony1661 avatar Dec 31 '23 03:12 tony1661

Thanks for the report @tony1661 we're investigating and will make sure this is part of the next grafana-flow release @AlexeyOplachko could you check this after the holidays?

lmangani avatar Dec 31 '23 12:12 lmangani

@AlexeyOplachko let me know if I can help in any way. I can provide logs, pcaps etc.

I have this on a production freeswitch server with heplify and captagent both running.

Hundreds of calls a day that we can look at.

tony1661 avatar Jan 04 '24 17:01 tony1661

Hi @tony1661 we're back next week and we' ll most definitely address this

lmangani avatar Jan 06 '24 09:01 lmangani

pushed fix for grafana-plugin, https://github.com/metrico/grafana-flow/pull/47

used field [tsNs] for increased sorting accuracy SIP messages

RFbkak37y3kIY avatar Jan 09 '24 09:01 RFbkak37y3kIY

Hi all,

I saw there were some code merged. If I pull the latest docker images, will I be able to test this?

tony1661 avatar Jan 24 '24 14:01 tony1661

As long as its using plugin version 10.0.10 you can also update an existing setup

lmangani avatar Jan 24 '24 14:01 lmangani

@tony1661 here's how

lmangani avatar Jan 24 '24 14:01 lmangani

@lmangani Thanks for your quick response. I tested and the issue seems to still be there. Is there anything I can provide to help? Logs etc

tony1661 avatar Jan 24 '24 15:01 tony1661

For starters can you please verify that your grafana indeed got new plugin version please? your_grafana_url/plugins/qxip-flow-panel image

On our side we'll try to replicate this issue today and see if we need anything else from you

AlexeyOplachko avatar Jan 25 '24 09:01 AlexeyOplachko

Hi @AlexeyOplachko ,

I have verified that I have 10.0.10 installed. See below: image

tony1661 avatar Jan 26 '24 13:01 tony1661

Anything I can help with?

tony1661 avatar Mar 13 '24 04:03 tony1661

@AlexeyOplachko please provide an update

lmangani avatar Mar 13 '24 09:03 lmangani

Anything I can help with?

@tony1661 Can you please provide screenshots of Message details with all the info in them, on two messages that are in incorrect order. image image And also can you please check if Sort Items is set. image

AlexeyOplachko avatar Mar 14 '24 10:03 AlexeyOplachko

Hi @AlexeyOplachko sorry for the delay on this.

The Sort items is set to "Sort by Time: Oldest first".

Here is what the call flow looks like:

image

Here is the first message (INVITE):

image

Here is the second message (200 OK):

image

Here is the fourth message (that is supposed to be second - 100 Trying):

image

tony1661 avatar Apr 23 '24 12:04 tony1661

Hi @tony1661, thanks for reply, seems like this is not a sorting issue, but an issue with data.

If you look closely, message with 100 Trying has timestamp almost 4 minutes later than 200 OK. And all three timestamps(one in labels, one in Time field, and nanosecond one) show matching data that supports this.

AlexeyOplachko avatar Apr 25 '24 13:04 AlexeyOplachko

@AlexeyOplachko Yea something seems off with the data. The issue happens on multiple HEP clients (heplify and captagent)

I have some screenshots from pcaps above that may assist.

I am running freeswitch (via FusionPBX)

tony1661 avatar Apr 25 '24 17:04 tony1661

@AlexeyOplachko Could you check if you ended up fixing this issue?

Dletta avatar Jun 21 '24 13:06 Dletta

@Dletta are you also experiencing this issue?

tony1661 avatar Jun 21 '24 13:06 tony1661

@AlexeyOplachko Could you check if you ended up fixing this issue?

Yes, from standpoint of our frontend there is no way for it to sort incorrectly, so it's only an issue with data

AlexeyOplachko avatar Jun 21 '24 13:06 AlexeyOplachko

@tony1661

I am not experiencing the same issue. I work with Alexey and wanted to make sure we don't let this issue go stale, :)

Dletta avatar Jun 21 '24 13:06 Dletta

@AlexeyOplachko Could you check if you ended up fixing this issue?

Yes, from standpoint of our frontend there is no way for it to sort incorrectly, so it's only an issue with data

What can I do to assist? I've used multiple HEP agents and get the same results. Homer 7 does not have the issue with the same data source

tony1661 avatar Jun 21 '24 17:06 tony1661

Hi @lmangani is there any progress on this? I really want to be able to move over to Homer 10 if possible 🙂

tony1661 avatar Aug 20 '24 19:08 tony1661