dolphin-robot icon indicating copy to clipboard operation
dolphin-robot copied to clipboard

AWS Broker disconnected / Unknown

Open bleughb opened this issue 1 year ago • 11 comments

Hi, Seeing this regularly now, my New M700 doesn't stay online for more than a day , when it becomes disconnected, it won't re-connect

bleughb avatar Mar 03 '24 23:03 bleughb

Please provide debug log level logs

elad-bar avatar Mar 04 '24 04:03 elad-bar

@elad-bar This is definitely widespread. And further, when trying to reload the integration, I am getting this error.

Failed to call service homeassistant.reload_config_entry. The config entry MyDolphin Plus (mydolphin_plus) with entry_id d1d9f283726648f34f594664a197119a cannot be unloaded because it is not in a recoverable state (ConfigEntryState.FAILED_UNLOAD)```

sh00t2kill avatar Mar 05 '24 22:03 sh00t2kill

This is what’s in my logs when the AWS Broker status changes from Connected to Disconnected:

2024-03-09 10:35:48.038 ERROR (Dummy-15) [custom_components.mydolphin_plus.managers.aws_client] AWS IoT connection interrupted, Error: AWS_ERROR_MQTT_UNEXPECTED_HANGUP: The connection was closed unexpectedly.
2024-03-09 10:35:48.038 ERROR (Dummy-15) [custom_components.mydolphin_plus.managers.aws_client] Status changed from 'Connected to the API' to 'Failed to access API'
2024-03-09 10:35:55.097 ERROR (MainThread) [homeassistant.util.logging] Exception in _on_aws_client_status_changed when dispatching 'mydolphin_plus_AWS_CLIENT_STATUS_SIGNAL': ('03944ca8023f2f881f88758cf16649dc', <ConnectivityStatus.Failed: 'Failed to access API'>)
Traceback (most recent call last):
  File "/config/custom_components/mydolphin_plus/managers/coordinator.py", line 325, in _on_aws_client_status_changed
    await self._aws_client.terminate()
  File "/config/custom_components/mydolphin_plus/managers/aws_client.py", line 155, in terminate
    future.result()
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
awscrt.exceptions.AwsCrtError: AWS_ERROR_MQTT_CANCELLED_FOR_CLEAN_SESSION: Old requests from the previous session are cancelled, and offline request will not be accept.

When I try to reload the integration I get the following:

2024-03-09 13:38:22.572 ERROR (MainThread) [homeassistant.config_entries] Error unloading entry MyDolphin Plus for mydolphin_plus
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 672, in async_unload
    result = await component.async_unload_entry(hass, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/mydolphin_plus/__init__.py", line 79, in async_unload_entry
    await coordinator.terminate()
  File "/config/custom_components/mydolphin_plus/managers/coordinator.py", line 223, in terminate
    await self._aws_client.terminate()
  File "/config/custom_components/mydolphin_plus/managers/aws_client.py", line 155, in terminate
    future.result()
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
awscrt.exceptions.AwsCrtError: AWS_ERROR_MQTT_CANCELLED_FOR_CLEAN_SESSION: Old requests from the previous session are cancelled, and offline request will not be accept.

I’ve enabled the “debug logging” option for the integration and will report back if anything interesting comes from that.

To recover right now I need to restart the whole of Home Assistant.

jasoncodes avatar Mar 09 '24 03:03 jasoncodes

Same for me, rebooting HA usually fixes it.

Scope666 avatar Mar 09 '24 04:03 Scope666

Dropped again just now but nothing interesting extra with debug logging enabled. Dolphin is cleaning right now and the “cycle_time_left” sensor kept updating after the error. The error was about 30 seconds after the last 1 minute recurring refresh. It seems to know it’s disconnected as it doesn’t try to refresh again. It doesn’t seem to try to reconnect again and the exception makes me think the AWS client is now in a bad state.

Looking at the terminate method, I am wondering if we should be unsubscribing. I don’t think that’s necessary with clean session enabled. I’m going to comment out the unsubscribe code and see what happens.

jasoncodes avatar Mar 10 '24 04:03 jasoncodes

I finally found the error in my system as well, please let me know if you got also the following error before the error you listed above:

ERROR (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Failed to post JSON to https://mbapp18.maytronics.com/api/serialnumbers/getrobotdetailsbymusn/, HTTP Status: Unauthorized (401)

It could be that something is not working with the attempt to relogin, For that i will need more logs (debug level) or will try to get to it in 1.5 week

elad-bar avatar Mar 10 '24 04:03 elad-bar

I’ve had that 401 error a fair bit in the past (typically once a day) but nothing since 2024-02-21 when I updated to the current v1.0.10 release.

jasoncodes avatar Mar 10 '24 04:03 jasoncodes

Just had my first broker disconnect since I made the patch to not attempt to unsubscribe and it looks to have worked. Disconnected for 1 minute before the poll reinitialised the connection:

2024-03-12 08:06:02.194 ERROR (Dummy-4) [custom_components.mydolphin_plus.managers.aws_client] AWS IoT connection interrupted, Error: AWS_ERROR_MQTT_UNEXPECTED_HANGUP: The connection was closed unexpectedly.
2024-03-12 08:06:02.194 ERROR (Dummy-4) [custom_components.mydolphin_plus.managers.aws_client] Status changed from 'Connected to the API' to 'Failed to access API'
2024-03-12 08:06:02.200 DEBUG (MainThread) [custom_components.mydolphin_plus.managers.aws_client] Disconnecting AWS Client
2024-03-12 08:06:07.197 DEBUG (Dummy-4) [custom_components.mydolphin_plus.managers.aws_client] AWS IoT connection was closed
2024-03-12 08:06:07.197 INFO (Dummy-4) [custom_components.mydolphin_plus.managers.aws_client] Status changed from 'Failed to access API' to 'Disconnected by the system'
2024-03-12 08:06:07.198 DEBUG (MainThread) [custom_components.mydolphin_plus.managers.aws_client] AWS Client is disconnected
2024-03-12 08:06:07.209 DEBUG (MainThread) [custom_components.mydolphin_plus.common.base_entity] Data for binary_sensor_f0100yfg6m_dazza_aws_broker: {'is_on': False, 'attributes': {'status': <ConnectivityStatus.Disconnected: 'Disconnected by the system'>}}
...
2024-03-12 08:07:07.200 INFO (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Initializing MyDolphin API
2024-03-12 08:07:07.201 INFO (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Status changed from 'Connected to the API' to 'Establishing connection to API'
2024-03-12 08:07:09.062 DEBUG (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Status of https://mbapp18.maytronics.com/api/users/Login/: 200
...
2024-03-12 08:07:09.312 INFO (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Status changed from 'Establishing connection to API' to 'Connected with temporary API key'
...
2024-03-12 08:07:09.641 INFO (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Status changed from 'Connected with temporary API key' to 'Connected to the API'
2024-03-12 08:07:09.641 DEBUG (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Connected. Refresh details
2024-03-12 08:07:09.886 DEBUG (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Status of https://mbapp18.maytronics.com/api/serialnumbers/getrobotdetailsbymusn/: 200

Here’s the patch:

--- managers/aws_client.py.org	2024-02-22 09:07:45.833155383 +1000
+++ managers/aws_client.py	2024-03-10 14:22:12.813723089 +1000
@@ -146,15 +146,8 @@
         return self._data
 
     async def terminate(self):
         if self._awsiot_client is not None:
-            topics = self._topic_data.subscribe
-            _LOGGER.debug(f"Unsubscribing topics: {', '.join(topics)}")
-            for topic in self._topic_data.subscribe:
-                future, _packet_id = self._awsiot_client.unsubscribe(topic)
-
-                future.result()
-
             _LOGGER.debug("Disconnecting AWS Client")
 
             disconnect_future = self._awsiot_client.disconnect()
             disconnect_future.result()

jasoncodes avatar Mar 11 '24 22:03 jasoncodes

when clean_session of client set to False, it is required to unsubscribe all topics before reconnecting, removing those line will not help too much as this function is only get called when integration is being terminated (restart of HA / uninstall integration), there are too potential solutions as described in the closed PR (since code was added directly to develop branch and not using the PR):

  1. Change Clean Session to True and remove that unsubscribing process (preffered, that is pending re-alignment of branches)
  2. Extract function of that code of unsubscribe, place the call for it both in the terminate function as it was + before re-subscribing (if already subscribed)

elad-bar avatar Mar 22 '24 14:03 elad-bar

managers/aws_client.py already has clean_session=True: https://github.com/sh00t2kill/dolphin-robot/blob/3f9609ac4677de7869dc9659156e0cf3414131e9/custom_components/mydolphin_plus/managers/aws_client.py#L200

It’s why the AWS_ERROR_MQTT_CANCELLED_FOR_CLEAN_SESSION error is happening during teardown. This component has been reliable for me since removing the unnecessary unsubscribe code.

jasoncodes avatar Mar 22 '24 22:03 jasoncodes

Same with me. I manually deployed the change in my production HA after I saw it successfully disconnect and reconnect in my dev one.

sh00t2kill avatar Mar 25 '24 03:03 sh00t2kill

new release v1.0.13, please check and update

thanks

elad-bar avatar May 20 '24 06:05 elad-bar

any updates?

elad-bar avatar May 27 '24 13:05 elad-bar

Got this today, reloading the integration fixed it:

2024-05-31 07:51:38.451 ERROR (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Failed to post JSON to https://mbapp18.maytronics.com/api/serialnumbers/getrobotdetailsbymusn/, HTTP Status: Unauthorized (401)
2024-05-31 07:51:38.451 ERROR (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Status changed from 'Connected to the API' to 'Failed to access API'
2024-05-31 07:51:38.593 ERROR (MainThread) [custom_components.mydolphin_plus.managers.rest_api] Status changed from 'Establishing connection to API' to 'Invalid credentials'

Recently got this email from them, might be related:

"Dear Craig,

Your account security matters to us, so we're adding extra layers to keep your data safely protected.

We’re rolling out a One-Time Password (OTP) for the MyDolphin™ Plus app: a unique personalized code sent to you over email to log into the app.

Starting this week, you’ll need to use a OTP to access the app.

Here's how to switch to the new login:

Open the MyDolphin™ Plus app. Enter your email on the login screen. Check your inbox for an email from us with your personalized OTP. Enter the OTP into the app. A note about pairing your robot: 
If you logged in with the same email as before, you’ll be able to use the app immediately - no pairing needed

If you logged in before with 

  1. Bluetooth® connection OR Facebook, Google, or Apple, without providing Maytronics your current email

you’ll have to do the pairing again 

Note: For the next year, you won’t have to log in with OTP again. Just open the MyDolphin+ App and enjoy your Dolphin.

To keep your account secure, you’ll be asked to refresh your login once every 12 months.

Need help or have any questions? Please reach out to us for support.

All the best, Maytronics"

Scope666 avatar May 31 '24 13:05 Scope666

That's bad... thanks for sharing it will try to find a workaround

elad-bar avatar May 31 '24 13:05 elad-bar

That's bad... thanks for sharing it will try to find a workaround

Yeah, the way I read that email, it sounds like they're killing the passwords. Basically a weird form of 2FA.

Scope666 avatar May 31 '24 14:05 Scope666

I will try to contact them again and ask to invest more in our community instead of blocking us, this moght improve the situation of the stock a bit...

elad-bar avatar May 31 '24 15:05 elad-bar

is this issue still relevant?

thanks

elad-bar avatar Jun 23 '24 10:06 elad-bar