iris
iris copied to clipboard
Added fix for messages occasionally reappearing after power loss
This fixes a relatively rare issue that can cause messages to reappear after DMS power loss. Issue has not reoccurred following extended testing period in Nebraska.
I don't think this change fixes the described problem. The setCommAndPower method is only called for a small subset of messages -- those triggered by an action plan (not operator) with an indefinite duration. The "sticky" flag must be enabled for this, otherwise the message will have a duration of 3 polling periods.
This "sticky" feature is needed by MnDOT for DMS which mimic permanent signs -- for example, the sign on I-94 eastbound at Huron Blvd is required to be displayed continuously, even after comm. loss.
@jlstanley-git can maybe chime in since he understands the issue better than I do, but I believe that is the issue that we were running into. Nebraska uses a number of sticky action plans (since their signs sometimes lose communication due to network disruptions), and I think the message in question was related to one of those (though I don't quite remember). Sorry if my initial message wasn't clear.
@DougLau
The patch doesn't change the permanence of sticky messages while they're deployed. It changes what happens after a sticky message is blanked by IRIS and then a communications-loss or long-power-recovery occurs.
The old code sets sticky messages to be re-displayed from changeable.1 whenever either event occurs. Since changeable.1 isn't changed by IRIS when the sign is blanked, this causes signs that have been blanked by IRIS to sometimes mysteriously re-display an old message (the one that's still sitting in changeable.1) on comm-loss or power-recovery.
The new code sets sticky messages to be re-displayed from currentBuffer.1 whenever either event occurs. Since currentBuffer.1 holds the most recently displayed message, sticky messages will still faithfully re-display after comm-loss and long-power-loss events. And, since currentBuffer.1 is changed when the sign is blanked by IRIS, this means a sign that IRIS blanks will stay blanked, fixing the problem.
It's true that changeable.1 isn't changed when the sign is blanked by IRIS, but both dmsCommunicationsLossMessage and dmsLongPowerRecoveryMessage are changed to blank.1. These objects can't be responsible for this problem after an IRIS-initiated blank.
In theory, changing these to currentBuffer.1 (as this patch does) would not be a problem -- BUT we have devices supporting only NTCIP 1203v1. That version of the spec. barely even mentions currentBuffer! I have doubts that it will work correctly with those old signs.
Is it possible that dmsPowerLossMessage is somehow involved in this? IRIS doesn't access that object currently.
You are correct about the messageIDs being changed to blank.1 when IRIS blanks the sign.
Since changing IRIS to use currentBuffer.1 did fix the problem for the customer (and this IS the very type of situation that currentBuffer.1 was created to solve), it seems unlikely that dmsPowerLossMessage is involved.
I apologize for forgetting to mention one detail about the problem we were trying to fix. The missing detail was that field engineers and state patrol in the state in question also sometimes blank signs from sign handhelds (and occasionally other tools). When they do this, the messageIDs are not reset to blank.1, leaving the signs in the problem state. Since the zombie-messages (so called because they mysteriously kept coming back to life) only re-appear when a sticky message was deployed from IRIS, the customer wanted us to try to fix things from the IRIS side before they take on the almost impossible task of retraining a host of field users (many who were not in their chain of command) out of bad habits.
Would you be open to trying a hybrid solution? We first try currentBuffer.1. If that returns a bad-value or no-such-name, then we use changeable.1. (If MnDOT's v1 signs don't respond with either of those errors, we can use some other method to detect 1203v1/1203v2+ compatibility.)
I think that solution is acceptable.
See #138 for alternate fix