trunk-recorder icon indicating copy to clipboard operation
trunk-recorder copied to clipboard

controlWarnUpdate rate

Open rosecitytransit opened this issue 2 years ago • 2 comments

Control how often the Control Channel Decoding Rate is logged; sometimes you want it every 3 seconds to test things, other times once a minute suffices. Keeps the actual rate checking at 3 second intervals.

Originally https://github.com/robotastic/trunk-recorder/commit/2d6a95484f3919a9ff4af4cde92db315d847ff04

rosecitytransit avatar Apr 11 '22 03:04 rosecitytransit

I am a little confused on this approach - it looks like this changes it so it will only perform the check at a configurable interval now, right? The challenge with this is if you have very bursty changes you would miss short interval drops. Like it would be checking a 3 sec interval every 30 sec.

What is the overall goal? With the threshold, it should only print this messages when things are bad and this shouldn't be too often. Are you looking to instead occasionally have an output of the channel decode even when things are above the threshold?

robotastic avatar Apr 30 '22 13:04 robotastic

It's supposed to leave the internal checking (to make sure control channel messages are still coming in) at 3 seconds but only output the "Control Channel Message Decode Rate" messages to the log file if it's been X seconds since last doing it (e.g. if set to 60, only output every 20 checks) . The goal is to reduce the size of the log files a bit yet still log the rate to see how decoding is going. And leave the option to have the rate be output every 3 seconds for antenna positioning/testing/etc.

rosecitytransit avatar Apr 30 '22 19:04 rosecitytransit

Are you looking to instead occasionally have an output of the channel decode even when things are above the threshold?

Correct, I use the data for the decoding rate graph and to confirm every thing is working OK

rosecitytransit avatar Mar 26 '23 23:03 rosecitytransit

@robotastic you want to merge this in?

Also, I was wanting to have it calculate the rate since the last log message (e.g. during the last 60 seconds if controlWarnUpdate is set to 60) rather than always just during the last 3 seconds, but it looks like the message count gets reset every check.

And is there a reason why the iteration at https://github.com/robotastic/trunk-recorder/blob/master/trunk-recorder/main.cc#L1499 can't be combined with the iteration in check_message_count at https://github.com/robotastic/trunk-recorder/blob/master/trunk-recorder/main.cc#L1379 ?

rosecitytransit avatar Apr 02 '23 16:04 rosecitytransit

And is there a reason why the iteration at https://github.com/robotastic/trunk-recorder/blob/master/trunk-recorder/main.cc#L1499 can't be combined with the iteration in check_message_count at https://github.com/robotastic/trunk-recorder/blob/master/trunk-recorder/main.cc#L1379 ?

Yea - it might be possible to combine the 2, or move the Stale Patch checking to its own function, but lets leave that alone for now.

I think there maybe 3 different values at play - how often we would want to display the warning message, the period of time over which the average should be calculated and the message rate value that triggers if the message should be displayed.

There are probably 2 different scenarios where this would be used:

  • I want to get an error message right away if my message rate drops below a value and I don't want the average to be over a long time because it might miss spikes.
  • I want to occasionally get the decode rate so I can keep an eye on things. Averaging period might not matter as much here.

I sort of arbitrarily picked 3 seconds as the period of time to average over.... not sure that is right, but maybe it is good enough.

It would be nice to support both use cases at once, print the message every x seconds if the rate is good and print it right away if it is bad.

Let me see if I can rework things to support that

robotastic avatar Apr 03 '23 13:04 robotastic

I sort of arbitrarily picked 3 seconds as the period of time to average over.... not sure that is right, but maybe it is good enough.

Three seconds is okay, but it has a tendency to push repeating decimals like 31.3333333333333333 across via MQTT or the status server. I don't know if it truly needs a configurable setting in the .json, but a four or five-second interval would likely provide floats that are a little nicer to display and work with.

taclane avatar Apr 03 '23 13:04 taclane

that is gross!! Hmm... sounds like it might be good to upgrade the MQTT / Status Server to do a little bit of rounding after a certain precision.

robotastic avatar Apr 03 '23 14:04 robotastic

Thanks for getting back to this.

I want to get an error message right away if my message rate drops below a value

What if I change if (msgs_decoded_per_second < config.control_message_warn_rate || config.control_message_warn_rate == -1) { instead to if (msgs_decoded_per_second < config.control_message_warn_rate || (config.control_message_warn_rate == -1 && warn == true) {? I think this would log the rate every 3 second interval if it's below the controlWarnRate threshold AND always log it at the controlWarnUpdate setting. The second half could be just || (warn == true) or maybe || (config.control_message_warn_rate == -1) || (warn == true) by differentiating between an explicitly set controlWarnUpdate and the default.

I sort of arbitrarily picked 3 seconds as the period of time to average over.... not sure that is right, but maybe it is good enough.

Well, the 3 seconds seems to come from how long the program wants a lack of messages for to decide to retune. I think it's a fine amount.

rosecitytransit avatar Apr 03 '23 15:04 rosecitytransit

Let me know what you thing about the changes.

robotastic avatar Apr 04 '23 01:04 robotastic

Since it's an or || in line 1410, won't this still print out low message decode rates regardless of the value of print_message_rate?

Additionally, I'd suggest controlMessageUpdate default to -1 so that the behavior doesn't change unless someone specifically adds a controlMessageUpdate option.

Finally, this seems like something that should be handled outside of the trunk-recorder logs... I don't understand what the use case would be for expecting bad decode rates and not writing an error to the console? If decode rates are that low, a number of other issues are going to pop up and be even harder to troubleshoot.

tadscottsmith avatar Apr 04 '23 02:04 tadscottsmith

@tadscottsmith the behavior I was aiming for was to print out the current decode rate whenever it falls below the threshold and also every 30 seconds or so... did I mess that up? I have been doing JS for the past month so my C++ is rusty.

You are probably right though that instead of defaulting to 30 seconds, it should be -1 and people can optionally enable it.

The target use for this would be spotting when the decode is low, but is working well enough to capture most of the system. Could be a helpful signal that you need to adjust something... but fair question - maybe it would be better to have this be an MQTT thing

robotastic avatar Apr 04 '23 02:04 robotastic

My two cents and my two cents only:

I really would be against printing an "error" level message if things are properly working. I have gotten to the point where I go days and hundreds of thousands of calls without an error, and I think that's a good thing.

Do we really need something more granular in the logs than when the print_status() runs? The plugin system already gets the messages every 3 seconds.

tadscottsmith avatar Apr 04 '23 03:04 tadscottsmith

I do think it may be a good idea not to change the defaults.

did I mess that up?

Line 1410 looks OK to me. It's a simple if statement. Note that some may set the decoding threshold (controlWarnRate) high, and want logging when there's even a modest decrease.

I really would be against printing an "error" level message if things are properly working.

I'm guessing that the idea is to always print the message even if logLevel is set to error, e.g. if one just wants to make sure every thing is working and not any details. info might be more technically appropriate.

The plugin system already gets the messages every 3 seconds.

If the plugin system gets it, and there's a way to save the rate to a file, I'd be fine just using that and turning off the messages in the logs completely. I'd just have to modify the decode rate chart's data source.

rosecitytransit avatar Apr 04 '23 05:04 rosecitytransit

Would it make more sense to add a section in print_status() that prints the current decode rate for each system, and then make the timing of how often print_status() runs a user option?

tadscottsmith avatar Apr 04 '23 17:04 tadscottsmith

Would it make more sense to add a section in print_status() that prints the current decode rate for each system, and then make the timing of how often print_status() runs a user option?

Maybe we should just do that and then only print the message every 3 seconds if it's below controlWarnRate. Would simplify things, add flexibility and maybe help with debugging. The only downside is that (especially if you have a ton of recorders) it would mean more log file lines if doing it often, but I'd be willing to just address that in my copy (if I can't just use a plugin).

rosecitytransit avatar Apr 04 '23 17:04 rosecitytransit

Would it make more sense to add a section in print_status() that prints the current decode rate for each system, and then make the timing of how often print_status() runs a user option?

This is the WAY! That makes a lot of sense. Also great point about having printing out using an Error level when it really isn't an error.... I didn't pick up on that. Let me go make this switch.

robotastic avatar Apr 05 '23 15:04 robotastic

I was going to offer up to code it if there was consensus. My two thoughts are just:

*if it's print_status() time, don't output the decoding rate if it's low since the print_status() is going to do it

*it should maybe be error level if the decoding rate is below controlWarnRate (something bad is happening, though it may be external to the program), otherwise info like the current the status message.

I have gotten to the point where I go days and hundreds of thousands of calls without an error, and I think that's a good thing.

I don't ever really look at the log fines unless something has happened (e.g. issue w/control channel change. which is being addressed in 792)

Lastly, I'm not seeing a plugin to get the decoding rate

rosecitytransit avatar Apr 05 '23 15:04 rosecitytransit

Ok - I made the change so it is part of the Status message that gets printed out. I am going to leave it at the current fixed delay - which is 200 seconds. No one has asked for it to be more or less ever or to configure it, so let's see how it looks. Check the rate-status branch

robotastic avatar Apr 06 '23 00:04 robotastic

Being able to control the status updates might be a nice to have. As I said, I'm just wanting a way to get the rates for the decode rate chart yet not have them be logged 20 times (60/3) more than necessary (though the chart could easily be changed to handle more than 1 point per minute).

Lastly, if you haven't noticed, I also included a config entry for the logDir option.

rosecitytransit avatar Apr 06 '23 17:04 rosecitytransit

If you want to leave things as-is; I'd be willing to just customize things on my end and just submit the logDir config entry (unless you want to write that, in which case I could close this).

one other minor, somewhat related issue: I think the timestamp for last_decode_rate_check should be first set at 0 here: https://github.com/robotastic/trunk-recorder/blob/7bb5742a88d816d4ce3f8a68053d70e2e9cffbc8/trunk-recorder/main.cc#L1433

Now, there's a delay at startup because the program waits for the 3 seconds of no control channel messages to do a retune.

rosecitytransit avatar Apr 09 '23 18:04 rosecitytransit

Well, although it meant another day delay in getting things upgraded, I've decided to switch to logging the rates in a separate file. See https://github.com/robotastic/trunk-recorder/commit/0cd6c66c240abf2fd20017bea6e36991950d4695

While it means another file, it will mean not having to grep through what can be very large log files, not needing to keep (or even have) log files, and save size vs the multiple "Control Channel Message Decode Rate:" messages per cycle. Need to still update the decoding chart.

I can change this PR to that method if this is desirable. I now realize there is a (undocumented?) plugman_system_rates(systems, timeDiff); method to put it in a plugin; I like the efficiency using the existing for loop. Also, while I did use the exiting last_status_time variable and turn off the status message printing (personal preference now), creating another variable is a very simple thing.

rosecitytransit avatar Apr 11 '23 17:04 rosecitytransit

I now realize there is a (undocumented?) plugman_system_rates(systems, timeDiff); method to put it in a plugin; I like the efficiency using the existing for loop.

Yup. There's a lot of power in the plugin architecture to solve exactly something like this. Here is an example usage in the MQTT plugin

https://github.com/robotastic/trunk-recorder-mqtt-status/blob/d37d02ea131dc758f079ee88181bcb2e42a47a48/mqtt_status_plugin.cc#L85

but it's likely pretty simple to build a single-purpose plugin to only output system_rates wherever and however frequently one might want.

taclane avatar Apr 11 '23 18:04 taclane

Sorry - I have been over focused on trying to debug why my OpenMHz code isn't working.

I don't see the logDir config as part of this PR. Would you create a separate PR for that?

The creation of a log file for building a decode rate chart is best handled by a plugin. Let me know if there are any kind of changes required to the plugin system to help support it. I think everything should be there...

For the last_decode_rate_check wouldn't you have to wait 3 seconds anyhow to accumulate enough messages to see if you can receive things well enough or not?

robotastic avatar Apr 12 '23 02:04 robotastic

No problem and thanks, as I've noticed some wonkyness with the site. As for logDir, see https://github.com/robotastic/trunk-recorder/pull/666/commits/bc23e8805c388c7c6eb557706a6d18151f098782 I can put it in another PR if we're unsure about this one. The actual functionality should already exist.

As for decode rate logging, would you like me to switch this PR to using the method in https://github.com/robotastic/trunk-recorder/commit/0cd6c66c240abf2fd20017bea6e36991950d4695 , and then someone (maybe me) can move it to use a plugin?

Yup. There's a lot of power in the plugin architecture to solve exactly something like this.

If it was me I'd just have a lot of places where it could optionally run a script with certain variables, but I agree that having access to the actual internal data is useful, and adding code in a separate plugin file prevents potential future conflicts.

For the last_decode_rate_check wouldn't you have to wait 3 seconds anyhow to accumulate enough messages to see if you can receive things well enough or not?

But at startup there aren't any messages. What I'm seeing is it gets set to the current time at https://github.com/robotastic/trunk-recorder/blob/7bb5742a88d816d4ce3f8a68053d70e2e9cffbc8/trunk-recorder/main.cc#L1433 then compared to the current time-3 secs at https://github.com/robotastic/trunk-recorder/blob/7bb5742a88d816d4ce3f8a68053d70e2e9cffbc8/trunk-recorder/main.cc#L1505-L1507 At startup, the correct last check time is never.

rosecitytransit avatar Apr 12 '23 04:04 rosecitytransit

This is now a plugin that logs the decoding rates to decode_rates.csv in the capture_dir. A header line is put in the file each time the program started. This only logs the rate over the last 3 seconds; I am thinking of having it calculate the rates over the entire interval by saving and adding to the numbers each time the plugin is called.

rosecitytransit avatar May 04 '23 08:05 rosecitytransit

@robotastic this should be good to go, except for the rates being for only the proceeding 3 seconds. I know I could create a vector in the plugin to save the rates if it's not time to log yet, but I'd like a separate counter (or maybe plugin call) comparable to sys->set_message_count() https://github.com/robotastic/trunk-recorder/blob/e33e441278ae130a6884d464a5d75b97655ca5fd/trunk-recorder/main.cc#L1480 so it could be independent of check_message_count() and log at any chosen interval.

I was thinking of maybe creating a plugin call in handle_message() https://github.com/robotastic/trunk-recorder/blob/e33e441278ae130a6884d464a5d75b97655ca5fd/trunk-recorder/main.cc#L1228 which could be useful for other ways to handle trunking messages. Your thoughts?

Also, ignore what I said about the last_decode_rate_check; on closer inspection it's not waiting for the time_diff at startup like I thought it was.

rosecitytransit avatar May 07 '23 04:05 rosecitytransit

Thinking about and looking at things, how about I (maybe in a separate issue, before this one gets addressed) create a handle_message plugin that passes the message and system and then move the following 2 sections that are only used for unit script to that plugin? I'm guessing things could be simplified by having one function with an if statement to set the actual text sent. I know this would be a change, but it would move the stuff out of trunk-recorder, the non-call plugins aren't documented anyways, and I don't think there's going to be a lot of demand for one dedicated to e.g. acknowledge responses. https://github.com/robotastic/trunk-recorder/blob/592e551f9f7153986ab2f439e4bbc3e0f54e8a38/trunk-recorder/main.cc#L1257-L1300 https://github.com/robotastic/trunk-recorder/blob/592e551f9f7153986ab2f439e4bbc3e0f54e8a38/trunk-recorder/main.cc#L964-L996

rosecitytransit avatar May 10 '23 04:05 rosecitytransit

Hmm... good question. It probably makes sense to leave that processing part of Trunk Recorder. It makes it obvious for people writing plugins that those message types exist. Over on the Discord server they are working on an MQTT plugin that pulls in lots of these data points. What I should probably do is update the Docs so people know that these hooks exist.

But, it might also be good to create a plugin hook that just passes the message string and system to the plugin - depending on what you are doing it might be easier to just have a switch statement in your plugin instead of a lot of functions.

robotastic avatar May 10 '23 12:05 robotastic

so it could be independent of check_message_count() and log at any chosen interval. I was thinking of maybe creating a plugin call in handle_message()

https://github.com/robotastic/trunk-recorder/blob/e33e441278ae130a6884d464a5d75b97655ca5fd/trunk-recorder/main.cc#L1228

Good point - this call could also be used for just counting the number of messages.

When you plugin is ready to go, can you roll it into its own Repo? I have a few plugins I created that could be examples: https://github.com/robotastic/trunk-recorder-mqtt-status https://github.com/robotastic/trunk-recorder-mqtt-statistics Having a separate repo keeps the build times down and let you make updates without having to do a PR.

This reminds me that I also have to create a section in the README to track the Plugin Repos.... I will go do that now. Once you have your repo, just submit a PR with a link to the repo.

robotastic avatar May 10 '23 12:05 robotastic

It makes it obvious for people writing plugins that those message types exist.

I would just have documentation that they do, or point people to the plugin.

Over on the Discord server

I know I should give in and create an account and go through all of/join the discussion there (why doesn't GitHub have a chat function? Or at least a discussion section just like "Issues" but not for specific bugs/feature requests/etc.)

depending on what you are doing it might be easier to just have a switch statement in your plugin instead of a lot of functions.

Exactly. I said "if statement" but switch would be more appropriate if handling a lot of types.

Good point - this call could also be used for just counting the number of messages.

Which is just what I want to do. :)

Having a separate repo keeps the build times down and let you make updates without having to do a PR.

As long as there's simple, explicit instructions on how to add external plugins (how to merge the code in). I'll modify this PR to just be a handle_message plugin hook and the changes to README.md I made. But let me know if you decide that the unit change stuff should be moved too. I could create a repo for that too.

EDIT: looks like you'd just need a "if including, add_plugin" here https://github.com/robotastic/trunk-recorder/blob/0dbe2f3e3974b3e959394452f7fec808e7d0f470/trunk-recorder/main.cc#L574 and "if exists add_directory" here https://github.com/robotastic/trunk-recorder/blob/0dbe2f3e3974b3e959394452f7fec808e7d0f470/CMakeLists.txt#L331

rosecitytransit avatar May 11 '23 17:05 rosecitytransit