mod_prometheus icon indicating copy to clipboard operation
mod_prometheus copied to clipboard

freeswitch_sessions_active - Negative Value

Open JonHVU opened this issue 7 years ago • 9 comments

Hi @moises-silva,

Sorry for the noise but we are testing the module on a quiet box, and the sessions active appear to have a negative value;

HELP freeswitch_sessions_active FreeSWITCH Active Sessions freeswitch_sessions_active -197 1495446539381

What could cause this?

Thanks

Jon

JonHVU avatar May 22 '17 09:05 JonHVU

@JonHVU That's a mismatch between CHANNEL_CREATE/CHANNEL_DESTROY events. It seems the module received more channel destroy events than create. I need to fix that logic to at least validate that and not go below zero. Did you reload the module by any chance? I think this could happen if you do a module reload while there are active calls, because it won't remember the calls that are active already.

moises-silva avatar May 22 '17 17:05 moises-silva

@moises-silva I'm getting the following negative value as well with active registrations:

HELP freeswitch_registrations_active FreeSWITCH Active Registrations freeswitch_registrations_active -26236 1499776145514

Total actual registrations on this tested switch is 12.

Any ideas?

Thanks

Troy

socomsystems avatar Jul 11 '17 12:07 socomsystems

Sadly, it's buggy. I need to rewrite that to query the core explicitly for the registration counts as opposed to relying on events. The problem is starting FreeSWITCH when there's previous state (e.g registrations already in the db). I hope is not much problem, I'll try to get it done over the weekend.

moises-silva avatar Jul 11 '17 21:07 moises-silva

@moises-silva thanks for the feedback! I've updated to most recent commit and am still experiencing same behavior as well as other oddities at times. Specifically on FS servers that are in PostgreSQL BDR Multi-master schema that are essentially in standby for fail-over. The primaries registrations do replicate the registration data to all in a cluster. I've also noted that on the primary, that active registrations continue to climb exponentially. Hope this helps.

Video of primary in the cluster: https://www.screencast.com/t/9aF7vw76fKe

Video of a standy: https://www.screencast.com/t/2m54fwOKsj7

I may have the queries done improperly. They are as follows:

  1. Active sessions: freeswitch_sessions_active{instance=~"$node:.*"}

  2. ASR: freeswitch_sessions_asr{instance=~"$node:.*"}

  3. Active Calls (last 12 hours): ((freeswitch_sessions_answered_total{instance=~"$node:.*"} - freeswitch_sessions_failed_total{instance=~"$node:.*"}) / (freeswitch_sessions_answered_total{instance=~"$node:.*"} )) * 100

  4. Active Registrations: freeswitch_registrations_active{instance=~"$node:.*"}

  5. Heartbeats: ((freeswitch_heartbeats_total{instance=~"$node:.*"}) / (freeswitch_heartbeats_total{instance=~"$node:.*"} )) * 100

  6. Freeswitch Regisrations Total: freeswitch_registrations_total{instance=~"$node:.*"}`

Please excuse my ignorance, I'm green with regard to Prometheus, Grafana and Rust for that matter. Loving every minute of this though.

I'm interested in seeing if I can correct this via ESL, as you mentioned querying the core via sofia request would be more accurate than log parsing, I'm vague on exactly how to go about doing it though. Your readme makes note of its ability, any chances of nudge in the right direction? I very much appreciate your quality work and other efforts regarding your project!

Once I get a handle on this, what I'd like to focus on next is being able to see registrations on a per domain basis e.g. hard sets for expected registrations for each domain as FS is a great multi-tenant platform. Then alarming on e.g. 10% or more registrations loss on a per domain basis.

Cheers, Troy

socomsystems avatar Jul 27 '17 08:07 socomsystems

Hello,

I also get negative for freeswitch_sessions_active. It seemed it started for a while and then it stabilized. I just started testing this. The freeswitch_sessions_active is the most important value for us now.

Great work, hopefully it will be fixed!

image

sfrique avatar Aug 11 '17 20:08 sfrique

@moises-silva just checking in on your mod_Prometheus as its been a while. Reinstalled / compiled still seeing the same possible issues, perpetual climbing of active registrations, failures, attempts, reg totals, and heartbeat. Is this behavior as intended? Haven't viewed new commits, itching to apply your mod. So much potential! Wish I had time to gander rust. Is it now functional as presented? Am I misunderstanding the mentioned metrics? Thank you for your contribution. Cheers, Troy

socomsystems avatar Apr 27 '18 23:04 socomsystems

Yeah, I wish I had the free time to spend on this but I don't. This module was an experiment to get a module written in Rust interfacing with FreeSWITCH. I'll put up a disclaimer in the README indicating it's broken and is only useful as an example of how to get a Rust module built for FreeSWITCH, but the bugs that were found have not been fixed and I can't really commit to when I'll be able to fix them (even more since I have no use for this module myself at the moment).

moises-silva avatar Apr 30 '18 03:04 moises-silva

Thanks for the feedback @moises-silva

socomsystems avatar Apr 30 '18 03:04 socomsystems

This issue may be solved by setting gauges/counters to the value from FreeSwitch internal counters.

kvishnivetsky avatar May 27 '19 16:05 kvishnivetsky