python-elgato-streamdeck Recover from suspend and resume

Is your feature request related to a problem? Please describe.

The API does not allow one to detect and recover from a suspend/resume cycle.

In my observations (on Linux), the unique device ID changes when you unplug and plug the device back in. This makes it fairly easy to discover that a device was removed and you can recover. The typical strategy is to enumerate all the devices on a separate thread and then compare with previously known list of IDs.

However, when a computer is suspended and resumed, the device ID does not change. This means you can't detect that a suspend/resume event happened the same way you do with a removal. When a computer is suspended, the Stream Deck is reset (presumably by the operating system) and will show the default screen. Since the read thread (in StreamDeck._read()) will likely be the thread that hits the error first, it kills the thread, but does not attempt to call Close or even clear the handle. From the "outside" the StreamDeck object seem perfectly valid - except - the key events will no longer fire. The only way you can detect a failure - is if you actually try to write something to the StreamDeck. There does not seem to be an efficient way to "poll" the deck to figure out if the handle is valid or if one is able to communicate, short of constantly reading the serial number, updating an image etc.

Please note that calling StreamDeck.connected() will continue to return true after a suspend/resume even though the read thread is dead. This is because it's comparing its own instance ID with the list of the enumerated IDs, which of course did not change and therefore is still in the list.

Describe the solution you'd like

I think there are a couple of solutions to consider:

is_open() property

Introduce a property called is_open to Transport.Device and this basically just returns if the Handle is not None.
Implement this property in StreamDeck by calling Transport.Device.is_open().
When a TransportError happens inside the _read method, call Close() and ignore errors.
Since Close() sets the handle to None, is_open() will return False.
This will allow someone from the outside to monitor StreamDeck.is_open() and detect a failure.

BTW It's not clear to me if there is ever a situation where a TransportError inside the _read() method is recoverable from. The user of the API is not able to detect that it happened (unlike an exception from a write operation which bubbles to the caller).

closed callback

Take advantage of the _read method's thread to detect a failure and like in option 1, close the device.
Instead of having to poll a property, the StreamDeck object will call a user specified callback.
The callback can be optionally set during Device.Open. Call the callback whenever Close is called.

probe() method

Similar to Transport.probe() to see if you have a healthy/installed back-end, provide a probe method on the device itself.
Not sure what one can do here - but perhaps attempting a "0 byte read" will allow us to issue an instruction which is a no-op.
The downside with this approach is that we're actually interacting with the device, not just checking a handle/state, so it has to be very quick and supported on all Stream Decks.

Handle outside the streamdeck API.

hook into the operating suspend/resume event system and close/open the device as a recovery strategy.

I'm happy to do the work and submit a pull request, but would like some input on what you think about this problem and how one can solve it. Also, I did not try this on Windows or Mac. The ID behaviour could be different on those platforms, but I think the fact that any TransportError inside _read() is undetectable, the library will benefit from a solution.

Sep 16 '21 20:09 dodgyrabbit

This is a good idea, and one I want to think about a little (sorry for the horrific delay in my response, I've been a little busy of late).

System suspend/resumes are tricky to detect in a good, cross platform manner. Even on Linux, not all systems use the DBus base messaging bus for this. However, making the library properly detect the death of the internal reader thread and report the device handle has closed is definitely something that needs to be implemented, along with a user-defined closed callback.

Oct 10 '21 03:10 abcminiuser

Oh, I forgot to give my immediate reaction: I think your idea of adding a second is_open() property for the logical open state is a good one, leaving connected() to just report the physical connection state (as it is documented to do so now).

We'd need a set_close_callback()/set_close_callback_async() that can be used to bind a user callback to close events, which would need to be triggered both in the error state and in the normal destruction state for consistency.

Currently all read errors are considered fatal, as there's nothing that can reasonably be done to recover on errors except for closing and re-opening the device handle to resync the state with the device.

Oct 10 '21 03:10 abcminiuser

Can we get an update on this? I have the same problem.

Also i think because iam running from a VM, sometimes the device just "disconnects" and i need to restart the vm because a hidden filehandle/lock is not closed correctly.

Just an easy recovering from this would be golden.

Jan 05 '22 19:01 AskMeAgain