opcua-asyncio
opcua-asyncio copied to clipboard
How to properly handle connection issues and reconnecting? (request for comments)
Hello everyone!
First off, thanks a lot to every contributor of this repository; it's a great library that has helped us out tremendously in multiple projects. Secondly, I hope it's okay that I'm using an issue to open a discussion. I'd like to gather some insights from people who are more knowledgable than I am about OPC-UA and this library, hoping that I'll be able to contribute a well-rounded feature out of this discussion in the future.
The topic is handling connection issues and reconnecting properly. Right now, whenever our application loses the connection to the OPC-UA server, for example because the PLC config changed and it's reloading the server, we're reconnecting the client from our application once we try to interact with a node and it fails (we catch the UaError and simply try connecting up to a few times). This was fine until subscriptions came into play. With subscriptions, I'm really having a hard time finding the proper way to detect issues, reconnect and restart the subscriptions.
I've found the Client._monitor_server_loop()
method, which is started as a task into Client._monitor_server_task
. Once the connection dies, it'll inform the subscriptions of the BadShutdown
. This seems to be about the only way to be informed about a connection issue other than emulating that behaviour externally to the client, polling and catching errors when they are raised. Another method of detecting connection issues is the Client.check_connection()
method. But again, this method must be polled from the application external to the client.
I think ideally the client itself should provide a mechanism to allow applications to react to connection issues and states in general, i.e. callback when the client lost the connection. On top of that, it should then implement an optional reconnect mechanism that, when enabled, automatically attempts to reconnect upon losing connection, including restoring any subscriptions.
My current proposal would be the following:
- Add three
asyncio.Event
instancesClient.connected
,Client.disconnected
,Client.failed
. These events areset()
when the respective connection state is reached andclear()
-ed when the respectice state is left. This would allow application code to simplyawait client.connected.wait()
before each interaction with the client. It would also allow to run error handler tasks once the connection fails withawait client.failed.wait()
. - Maybe add a set of methods
Client.add_connected_callback()
,Client.add_disconnected_callback()
,Client.add_failed_callback()
to register callback functions which are called once the respective state is reached. - Add a new optional parameter to
Client()
which could be as simple asauto_reconnect: bool = False
. - Whenever the
auto_reconnect
is enabled, an additional taskClient._auto_reconnect_task
will be created by the client upon connecting, which continously callsClient.check_connection()
similiar to how theClient._monitor_server_loop()
works, and in case of an error automatically tries connecting the client again. - Probably a bit more configuration is required for that feature, so maybe add a dataclass
AutoReconnectSettings
. The following settings come to mind:- How often to try reconnecting before giving up
- How long to wait in between connection attempts (maybe with exponential backoff?)
- Whether or not to restart the subscriptions after reestablishing the connection
- Maybe even allow deeper customization by pulling the reconnection logic into its own class
ClientReconnectHandler
, which would implement a simple strategy pattern to allow interchangeable reconnection mechanisms, providing aExponentialBackoffReconnectHandler
by default. The parameter could then have the signature ofauto_reconnect: bool | ClientReconnectHandler = False
, applying a default handler with default values when simply set toTrue
.
I'd love to hear what you guys think about this and how you would approach this. Maybe someone has already implemented a similiar reconnect mechanism and would like to share their thoughts, I'd greatly appreciate that.
Thanks a lot!