companion [RFC] Instance status

[RFC] Instance status

Open Julusian opened this issue 2 years ago • 3 comments

Describe the feature

Instances have two methods they can use to report status/information back to the user https://github.com/bitfocus/companion/blob/df1ca54d523547faebe4377a80837bf97026d552/instance_skel.d.ts#L158-L160

log will append lines only visible in the log tab status will change the device status shown in the connections tab. If set to error, it will also show at the top of buttons which will not work correctly.

The status method is being used inconsistently in modules, leading to the following issues:

Reconnecting is often shown as a warning, which hides that instances are unavailable. They briefly flash through error state
Some stateless (http based) instances set the status to bad when they encounter an error, and ok when they succeed. Should this really show on every button using that instance? Other actions are likely to remain working

We should formulate a plan on how the status and log should be used, which may involve changing the levels the status has.

Potential ideas:

show error and warning status on buttons. with warning being fed by warning/error log lines
have more concrete status levels (ready/reconnecting/disconnected/authentication_failure/unknown_error) and anything outside of that should only be logged. Companion will present those new levels in a new way
Do we need a separate status function at all? Perhaps it could all be done by counting log lines (not sure about this, it will cause weird behaviour if a module logs an error once, but doesnt keep logging that the connection failed)
Should the status function have the second argument? Or should that be reported solely in the log

Other ideas are welcome, there is no plan on this yet, only a desire to improve it

As part of this I do think we should rework the ui to be able to show a log per instance, and possibly look at making the log or some of the log visible in other views. Should we show error lines while in the other views too?

Feb 21 '22 22:02 Julusian

Do we need a separate status function at all? Perhaps it could all be done by counting log lines (not sure about this, it will cause weird behavior if a module logs an error once, but doesn't keep logging that the connection failed)

I have done this intentionally (only one log message) to prevent polluting the log with 'Reconnecting...' every 5 seconds. I reapply the status because the error/warning box blinks when the reconnect occurs. This lets me know the module is working, just not connected.

Feb 22 '22 01:02 istnv

I've given quite a bit of thought to this and I'm of the opinion that this.status, this.log, and this.debug can be consolidated into into a single method that can then appropriately route the signals. This actually goes back to this Slack thread trying to get more robust logging/export from the field to developers.

First, I think we need to establish a full enum of all the conditions we want to track (@Julusian started that). Second, we need to establish the use-cases for each of those (taking into account how messages may repeat, such as for TCP reconnects). Third, we need to define the routing to the status readout, bank notifications, log, and debug.

Going back to the Slack discussion, it would be good to have a debug 'dump to file' capability for the modules. To do this, I think each instance should have a built-in 'Debug' checkbox with a message like 'Enable this if a developer asks you to'. From a debug perspective that would enable all processing for any DEBUG messages and it would enable a dump to file for that module for all status/log/debug. This does mean there wouldn't be debug messages for the module unless this is enabled. To help developers, we can read process.env.DEVELOPER and just bypass that instance config checkbox.

In terms of bank display, I think we need to establish two types of notification:

Connection state/error - basically what we have now (but need to show warn/yellow state if that is kept)
Other warning/error - a type of error that is action/feedback/data related indicating something went wrong, but not with the connection. This could be a one-time thing or even passing an error from the device (such as a temp alarm from a projector). These could be a colored exclamation mark with red and yellow states similar to connection state. This could be something that needs to be cleared, similar to a peak alert on an audio meter.

Feb 24 '22 18:02 krocheck

I think we are settled on this.debug to be combined into this.log.

Going back to the Slack discussion, it would be good to have a debug 'dump to file' capability for the modules

Yes, I think this is a good idea, and will be pretty easy to hook whatever api we come up with to begin logging to a file on demand.

As for how to handle this.status:

have more concrete status levels (ready/reconnecting/disconnected/authentication_failure/unknown_error) and anything outside of that should only be logged. Companion will present those new levels in a new way

I think this is the approach to take. We can start off with a few, and its not a big deal to add more in later releases. Docs should make the available levels clear, and that we are open to adding more. I expect this list to grow as we start porting modules to use this, both by us and other devs needing things we didnt think of.

I propose we start with:

ok (maybe called good or ready?)
connecting
disconnected
connection_failure (ie, failed auth, connection rejected)
invalid_config
unknown_error
unknown_warning

Do we need a separate status function at all? Perhaps it could all be done by counting log lines (not sure about this, it will cause weird behaviour if a module logs an error once, but doesnt keep logging that the connection failed)

I agree with my concern here, this will behave too unpredictable to be useful. Module devs do some weird stuff, so expecting logging to confirm to rules to make this work sounds like a bad idea. Maybe we should do the opposite and log every call to this.status at some level, for debugging/tracability purposes.

Should the status function have the second argument? Or should that be reported solely in the log

I am currently thinking that this second argument is useful, to provide some more context on the enum value. It should be made more prominent in the ui

In terms of bank display, I think we need to establish two types of notification:

Connection state/error - basically what we have now (but need to show warn/yellow state if that is kept)

Yes. I think we will be able to classify each of the status enum values into 'things will be broken' (red), 'something is wrong, but it should be ok' (yellow) and 'this is fine' (hidden)

Other warning/error - a type of error that is action/feedback/data related indicating something went wrong, but not with the connection.

I agree. Some way of showing that an action failed to run. I think this will need some experimenting and input from users, so lets make a separate task/issue to think about this. The data will be sourced directly from the function call to execute the action.
I feel like an exclamation mark will be too subtle. Perhaps a red border is drawn on the button for a second or two? Or the whole button goes red for a second? There are many ways this could be done, and will probably need to be opt-in/out

May 09 '22 14:05 Julusian

For me it seems there are two different things: status of the connection itself and status of the controlled device. I think that in the instance status only the status of the connection should be reflected. Maybe some module developers misused that for showing also status of the controlled device. I like the idea of an additional exclamation mark as a second hint of the device status in the topbar. So for me it would be: keep the status as it is and add another device status. More predefined status values are good for hopefully making the modules more consistent. Also I'm not the biggest fan of too much meshing logging and status. Status is a state during a period of time and logging is about single events. I'm perfectly fine with automatic logging of status changes as a convenience but I wouldn't try to make too much out of it. In my opinion log events are categorized best by severity, like debug, info, warn, error, critical. I'd vote for giving the module developer the freedom of what to log and how to set the status independently. I know devices where I want to log some errors but don't want to bother the user with an exclamation mark. Then it should be possible either for the module to reset the status or for the user to do it. Workflow maybe like:

Device generates a temperature warning -> red exclamation mark turns on, log entry
Device fails to do something -> exclamation mark stays on, log entry
Temperature normalizes -> the condition leading to exclamation mark in 1 is cleared, bit the mark still stays on because second condition still is not cleared
User needs to dismiss the condition of point 2 in the log to clear the mark I can also think of an option to clear a condition after a timeout. After all it should give more confidence of your device and a better monitoring but it shouldn't distract the user and make him constantly check the logs.

Oct 17 '22 23:10 dnmeid

I think everything I wanted to achieve with this task has been done in develop. The remaining bits to think about have a new task #2168

@dnmeid

For me it seems there are two different things: status of the connection itself and status of the controlled device. I think that in the instance status only the status of the connection should be reflected.

I would say that rather than the status of the connection, that should be the 'ability to do things on the device'. Saying that a device is 'ok' isn't exactly accurate if we know that every action will fail because of bad authenication.

But as long as actions/feedbacks can be run against that device/connection, then yes the status should be ok.

I like the idea of an additional exclamation mark as a second hint of the device status in the topbar.

Im not sure about this. I think I need some more examples of what would make use of that.
Remember that users can do things like this with feedbacks, so in what scenarios should a module developer be able to force an icon onto possibly affected buttons?

A good example of this could be reporting that only one of the two ATEM psu are connected. In some scenarios, this could be useful information, but to others this wont be useful information. But if you care, then a warning on each button could be the best way to be told, but it would definitely need to be opt-in per instance.

User needs to dismiss the condition of point 2 in the log to clear the mark

I don't like this. Having to go into the ui to do something isn't very user friendly. Users have already asked many times for the ability to rescan for streamdecks without opening the ui, so I don't think having to go there to dismiss warnings that you may not want or care about will be popular.

I'm hoping that maybe #2168 will cover this well enough, but if not lets open another issue about that.

Nov 08 '22 16:11 Julusian

companion companion copied to clipboard

[RFC] Instance status

Describe the feature

companion
companion copied to clipboard