trunk-recorder
trunk-recorder copied to clipboard
controlWarnUpdate rate
Control how often the Control Channel Decoding Rate is logged; sometimes you want it every 3 seconds to test things, other times once a minute suffices. Keeps the actual rate checking at 3 second intervals.
Originally https://github.com/robotastic/trunk-recorder/commit/2d6a95484f3919a9ff4af4cde92db315d847ff04
I am a little confused on this approach - it looks like this changes it so it will only perform the check at a configurable interval now, right? The challenge with this is if you have very bursty changes you would miss short interval drops. Like it would be checking a 3 sec interval every 30 sec.
What is the overall goal? With the threshold, it should only print this messages when things are bad and this shouldn't be too often. Are you looking to instead occasionally have an output of the channel decode even when things are above the threshold?
It's supposed to leave the internal checking (to make sure control channel messages are still coming in) at 3 seconds but only output the "Control Channel Message Decode Rate" messages to the log file if it's been X seconds since last doing it (e.g. if set to 60, only output every 20 checks) . The goal is to reduce the size of the log files a bit yet still log the rate to see how decoding is going. And leave the option to have the rate be output every 3 seconds for antenna positioning/testing/etc.
Are you looking to instead occasionally have an output of the channel decode even when things are above the threshold?
Correct, I use the data for the decoding rate graph and to confirm every thing is working OK
@robotastic you want to merge this in?
Also, I was wanting to have it calculate the rate since the last log message (e.g. during the last 60 seconds if controlWarnUpdate
is set to 60) rather than always just during the last 3 seconds, but it looks like the message count gets reset every check.
And is there a reason why the iteration at https://github.com/robotastic/trunk-recorder/blob/master/trunk-recorder/main.cc#L1499 can't be combined with the iteration in check_message_count
at https://github.com/robotastic/trunk-recorder/blob/master/trunk-recorder/main.cc#L1379 ?
And is there a reason why the iteration at https://github.com/robotastic/trunk-recorder/blob/master/trunk-recorder/main.cc#L1499 can't be combined with the iteration in check_message_count at https://github.com/robotastic/trunk-recorder/blob/master/trunk-recorder/main.cc#L1379 ?
Yea - it might be possible to combine the 2, or move the Stale Patch checking to its own function, but lets leave that alone for now.
I think there maybe 3 different values at play - how often we would want to display the warning message, the period of time over which the average should be calculated and the message rate value that triggers if the message should be displayed.
There are probably 2 different scenarios where this would be used:
- I want to get an error message right away if my message rate drops below a value and I don't want the average to be over a long time because it might miss spikes.
- I want to occasionally get the decode rate so I can keep an eye on things. Averaging period might not matter as much here.
I sort of arbitrarily picked 3 seconds as the period of time to average over.... not sure that is right, but maybe it is good enough.
It would be nice to support both use cases at once, print the message every x seconds if the rate is good and print it right away if it is bad.
Let me see if I can rework things to support that
I sort of arbitrarily picked 3 seconds as the period of time to average over.... not sure that is right, but maybe it is good enough.
Three seconds is okay, but it has a tendency to push repeating decimals like 31.3333333333333333
across via MQTT or the status server. I don't know if it truly needs a configurable setting in the .json, but a four or five-second interval would likely provide floats that are a little nicer to display and work with.
that is gross!! Hmm... sounds like it might be good to upgrade the MQTT / Status Server to do a little bit of rounding after a certain precision.
Thanks for getting back to this.
I want to get an error message right away if my message rate drops below a value
What if I change if (msgs_decoded_per_second < config.control_message_warn_rate || config.control_message_warn_rate == -1) {
instead to if (msgs_decoded_per_second < config.control_message_warn_rate || (config.control_message_warn_rate == -1 && warn == true) {
? I think this would log the rate every 3 second interval if it's below the controlWarnRate threshold AND always log it at the controlWarnUpdate setting. The second half could be just || (warn == true)
or maybe || (config.control_message_warn_rate == -1) || (warn == true)
by differentiating between an explicitly set controlWarnUpdate and the default.
I sort of arbitrarily picked 3 seconds as the period of time to average over.... not sure that is right, but maybe it is good enough.
Well, the 3 seconds seems to come from how long the program wants a lack of messages for to decide to retune. I think it's a fine amount.
Let me know what you thing about the changes.
Since it's an or ||
in line 1410, won't this still print out low message decode rates regardless of the value of print_message_rate
?
Additionally, I'd suggest controlMessageUpdate
default to -1
so that the behavior doesn't change unless someone specifically adds a controlMessageUpdate
option.
Finally, this seems like something that should be handled outside of the trunk-recorder logs... I don't understand what the use case would be for expecting bad decode rates and not writing an error to the console? If decode rates are that low, a number of other issues are going to pop up and be even harder to troubleshoot.
@tadscottsmith the behavior I was aiming for was to print out the current decode rate whenever it falls below the threshold and also every 30 seconds or so... did I mess that up? I have been doing JS for the past month so my C++ is rusty.
You are probably right though that instead of defaulting to 30 seconds, it should be -1 and people can optionally enable it.
The target use for this would be spotting when the decode is low, but is working well enough to capture most of the system. Could be a helpful signal that you need to adjust something... but fair question - maybe it would be better to have this be an MQTT thing
My two cents and my two cents only:
I really would be against printing an "error" level message if things are properly working. I have gotten to the point where I go days and hundreds of thousands of calls without an error, and I think that's a good thing.
Do we really need something more granular in the logs than when the print_status() runs? The plugin system already gets the messages every 3 seconds.
I do think it may be a good idea not to change the defaults.
did I mess that up?
Line 1410 looks OK to me. It's a simple if statement. Note that some may set the decoding threshold (controlWarnRate) high, and want logging when there's even a modest decrease.
I really would be against printing an "error" level message if things are properly working.
I'm guessing that the idea is to always print the message even if logLevel is set to error,
e.g. if one just wants to make sure every thing is working and not any details. info
might be more technically appropriate.
The plugin system already gets the messages every 3 seconds.
If the plugin system gets it, and there's a way to save the rate to a file, I'd be fine just using that and turning off the messages in the logs completely. I'd just have to modify the decode rate chart's data source.
Would it make more sense to add a section in print_status() that prints the current decode rate for each system, and then make the timing of how often print_status() runs a user option?
Would it make more sense to add a section in print_status() that prints the current decode rate for each system, and then make the timing of how often print_status() runs a user option?
Maybe we should just do that and then only print the message every 3 seconds if it's below controlWarnRate. Would simplify things, add flexibility and maybe help with debugging. The only downside is that (especially if you have a ton of recorders) it would mean more log file lines if doing it often, but I'd be willing to just address that in my copy (if I can't just use a plugin).
Would it make more sense to add a section in print_status() that prints the current decode rate for each system, and then make the timing of how often print_status() runs a user option?
This is the WAY! That makes a lot of sense. Also great point about having printing out using an Error level when it really isn't an error.... I didn't pick up on that. Let me go make this switch.
I was going to offer up to code it if there was consensus. My two thoughts are just:
*if it's print_status() time, don't output the decoding rate if it's low since the print_status() is going to do it
*it should maybe be error level if the decoding rate is below controlWarnRate (something bad is happening, though it may be external to the program), otherwise info like the current the status message.
I have gotten to the point where I go days and hundreds of thousands of calls without an error, and I think that's a good thing.
I don't ever really look at the log fines unless something has happened (e.g. issue w/control channel change. which is being addressed in 792)
Lastly, I'm not seeing a plugin to get the decoding rate
Ok - I made the change so it is part of the Status message that gets printed out. I am going to leave it at the current fixed delay - which is 200 seconds. No one has asked for it to be more or less ever or to configure it, so let's see how it looks. Check the rate-status
branch
Being able to control the status updates might be a nice to have. As I said, I'm just wanting a way to get the rates for the decode rate chart yet not have them be logged 20 times (60/3) more than necessary (though the chart could easily be changed to handle more than 1 point per minute).
Lastly, if you haven't noticed, I also included a config entry for the logDir option.
If you want to leave things as-is; I'd be willing to just customize things on my end and just submit the logDir config entry (unless you want to write that, in which case I could close this).
one other minor, somewhat related issue: I think the timestamp for last_decode_rate_check
should be first set at 0 here: https://github.com/robotastic/trunk-recorder/blob/7bb5742a88d816d4ce3f8a68053d70e2e9cffbc8/trunk-recorder/main.cc#L1433
Now, there's a delay at startup because the program waits for the 3 seconds of no control channel messages to do a retune.
Well, although it meant another day delay in getting things upgraded, I've decided to switch to logging the rates in a separate file. See https://github.com/robotastic/trunk-recorder/commit/0cd6c66c240abf2fd20017bea6e36991950d4695
While it means another file, it will mean not having to grep through what can be very large log files, not needing to keep (or even have) log files, and save size vs the multiple "Control Channel Message Decode Rate:" messages per cycle. Need to still update the decoding chart.
I can change this PR to that method if this is desirable. I now realize there is a (undocumented?) plugman_system_rates(systems, timeDiff);
method to put it in a plugin; I like the efficiency using the existing for loop. Also, while I did use the exiting last_status_time
variable and turn off the status message printing (personal preference now), creating another variable is a very simple thing.
I now realize there is a (undocumented?)
plugman_system_rates(systems, timeDiff);
method to put it in a plugin; I like the efficiency using the existing for loop.
Yup. There's a lot of power in the plugin architecture to solve exactly something like this. Here is an example usage in the MQTT plugin
https://github.com/robotastic/trunk-recorder-mqtt-status/blob/d37d02ea131dc758f079ee88181bcb2e42a47a48/mqtt_status_plugin.cc#L85
but it's likely pretty simple to build a single-purpose plugin to only output system_rates
wherever and however frequently one might want.
Sorry - I have been over focused on trying to debug why my OpenMHz code isn't working.
I don't see the logDir
config as part of this PR. Would you create a separate PR for that?
The creation of a log file for building a decode rate chart is best handled by a plugin. Let me know if there are any kind of changes required to the plugin system to help support it. I think everything should be there...
For the last_decode_rate_check
wouldn't you have to wait 3 seconds anyhow to accumulate enough messages to see if you can receive things well enough or not?
No problem and thanks, as I've noticed some wonkyness with the site. As for logDir, see https://github.com/robotastic/trunk-recorder/pull/666/commits/bc23e8805c388c7c6eb557706a6d18151f098782 I can put it in another PR if we're unsure about this one. The actual functionality should already exist.
As for decode rate logging, would you like me to switch this PR to using the method in https://github.com/robotastic/trunk-recorder/commit/0cd6c66c240abf2fd20017bea6e36991950d4695 , and then someone (maybe me) can move it to use a plugin?
Yup. There's a lot of power in the plugin architecture to solve exactly something like this.
If it was me I'd just have a lot of places where it could optionally run a script with certain variables, but I agree that having access to the actual internal data is useful, and adding code in a separate plugin file prevents potential future conflicts.
For the last_decode_rate_check wouldn't you have to wait 3 seconds anyhow to accumulate enough messages to see if you can receive things well enough or not?
But at startup there aren't any messages. What I'm seeing is it gets set to the current time at https://github.com/robotastic/trunk-recorder/blob/7bb5742a88d816d4ce3f8a68053d70e2e9cffbc8/trunk-recorder/main.cc#L1433 then compared to the current time-3 secs at https://github.com/robotastic/trunk-recorder/blob/7bb5742a88d816d4ce3f8a68053d70e2e9cffbc8/trunk-recorder/main.cc#L1505-L1507 At startup, the correct last check time is never.
This is now a plugin that logs the decoding rates to decode_rates.csv in the capture_dir. A header line is put in the file each time the program started. This only logs the rate over the last 3 seconds; I am thinking of having it calculate the rates over the entire interval by saving and adding to the numbers each time the plugin is called.
@robotastic this should be good to go, except for the rates being for only the proceeding 3 seconds. I know I could create a vector in the plugin to save the rates if it's not time to log yet, but I'd like a separate counter (or maybe plugin call) comparable to sys->set_message_count()
https://github.com/robotastic/trunk-recorder/blob/e33e441278ae130a6884d464a5d75b97655ca5fd/trunk-recorder/main.cc#L1480 so it could be independent of check_message_count()
and log at any chosen interval.
I was thinking of maybe creating a plugin call in handle_message()
https://github.com/robotastic/trunk-recorder/blob/e33e441278ae130a6884d464a5d75b97655ca5fd/trunk-recorder/main.cc#L1228 which could be useful for other ways to handle trunking messages. Your thoughts?
Also, ignore what I said about the last_decode_rate_check
; on closer inspection it's not waiting for the time_diff at startup like I thought it was.
Thinking about and looking at things, how about I (maybe in a separate issue, before this one gets addressed) create a handle_message
plugin that passes the message and system and then move the following 2 sections that are only used for unit script to that plugin? I'm guessing things could be simplified by having one function with an if statement to set the actual text sent. I know this would be a change, but it would move the stuff out of trunk-recorder, the non-call plugins aren't documented anyways, and I don't think there's going to be a lot of demand for one dedicated to e.g. acknowledge responses.
https://github.com/robotastic/trunk-recorder/blob/592e551f9f7153986ab2f439e4bbc3e0f54e8a38/trunk-recorder/main.cc#L1257-L1300
https://github.com/robotastic/trunk-recorder/blob/592e551f9f7153986ab2f439e4bbc3e0f54e8a38/trunk-recorder/main.cc#L964-L996
Hmm... good question. It probably makes sense to leave that processing part of Trunk Recorder. It makes it obvious for people writing plugins that those message types exist. Over on the Discord server they are working on an MQTT plugin that pulls in lots of these data points. What I should probably do is update the Docs so people know that these hooks exist.
But, it might also be good to create a plugin hook that just passes the message string and system to the plugin - depending on what you are doing it might be easier to just have a switch statement in your plugin instead of a lot of functions.
so it could be independent of
check_message_count()
and log at any chosen interval. I was thinking of maybe creating a plugin call inhandle_message()
https://github.com/robotastic/trunk-recorder/blob/e33e441278ae130a6884d464a5d75b97655ca5fd/trunk-recorder/main.cc#L1228
Good point - this call could also be used for just counting the number of messages.
When you plugin is ready to go, can you roll it into its own Repo? I have a few plugins I created that could be examples: https://github.com/robotastic/trunk-recorder-mqtt-status https://github.com/robotastic/trunk-recorder-mqtt-statistics Having a separate repo keeps the build times down and let you make updates without having to do a PR.
This reminds me that I also have to create a section in the README to track the Plugin Repos.... I will go do that now. Once you have your repo, just submit a PR with a link to the repo.
It makes it obvious for people writing plugins that those message types exist.
I would just have documentation that they do, or point people to the plugin.
Over on the Discord server
I know I should give in and create an account and go through all of/join the discussion there (why doesn't GitHub have a chat function? Or at least a discussion section just like "Issues" but not for specific bugs/feature requests/etc.)
depending on what you are doing it might be easier to just have a switch statement in your plugin instead of a lot of functions.
Exactly. I said "if statement" but switch would be more appropriate if handling a lot of types.
Good point - this call could also be used for just counting the number of messages.
Which is just what I want to do. :)
Having a separate repo keeps the build times down and let you make updates without having to do a PR.
As long as there's simple, explicit instructions on how to add external plugins (how to merge the code in). I'll modify this PR to just be a handle_message
plugin hook and the changes to README.md I made. But let me know if you decide that the unit change stuff should be moved too. I could create a repo for that too.
EDIT: looks like you'd just need a "if including, add_plugin" here https://github.com/robotastic/trunk-recorder/blob/0dbe2f3e3974b3e959394452f7fec808e7d0f470/trunk-recorder/main.cc#L574 and "if exists add_directory" here https://github.com/robotastic/trunk-recorder/blob/0dbe2f3e3974b3e959394452f7fec808e7d0f470/CMakeLists.txt#L331