PyLoxone icon indicating copy to clipboard operation
PyLoxone copied to clipboard

Error doing job: Task exception was never retrieved

Open tegner23 opened this issue 11 months ago • 28 comments

Describe the bug

Losing connection after 2-3 hours running the integration. Also refresh of the integration doesn't fix the issue and a full reboot is necessary to get the entities running again.

Firmware of your Miniserver

14.5.12.7

HomeAssistant install method

Pi5, Hassio

Version of HomeAssistant

2024.3.3

Version of Pyloxone

0.6.3

Update pyloxone

yes

Log

Logger: homeassistant Quelle: custom_components/loxone/api.py:325 Integration: PyLoxone (Dokumentation, Probleme) Erstmals aufgetreten: 20:44:32 (2 Vorkommnisse) Zuletzt protokolliert: 21:24:39

Error doing job: Task exception was never retrieved Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1302, in close_connection await self.transfer_data_task File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 959, in transfer_data message = await self.read_message() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1029, in read_message frame = await self.read_data_frame(max_size=self.max_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1104, in read_data_frame frame = await self.read_frame(max_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1161, in read_frame frame = await Frame.read( ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/websockets/legacy/framing.py", line 68, in read data = await reader(2) ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/asyncio/streams.py", line 752, in readexactly await self._wait_for_data('readexactly') File "/usr/local/lib/python3.12/asyncio/streams.py", line 545, in _wait_for_data await self._waiter asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/config/custom_components/loxone/api.py", line 325, in keep_alive await self._ws.send("keepalive") File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 635, in send await self.ensure_open() File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 935, in ensure_open raise self.connection_closed_exc() websockets.exceptions.ConnectionClosedError: sent 1011 (unexpected error) keepalive ping timeout; no close frame received

tegner23 avatar Mar 25 '24 20:03 tegner23

@tegner23 a refresh is not working if the connection is lost.

I do not know why this error occurs. Is it a gen2?

JoDehli avatar Mar 25 '24 21:03 JoDehli

@tegner23 I really.think.it is a home assistant problem. https://github.com/home-assistant/core/issues/114196

There are more issues with other websocket integrations. On loxone and pyloxone changed nothing.

JoDehli avatar Mar 25 '24 21:03 JoDehli

I had the same issue and restarting home assistant made it work again, I don't know how long it will work! my loxone is generation 1

williamsjou avatar Mar 31 '24 12:03 williamsjou

I use a Gen2 Server.

Is there a possibility to use the keepalive ping in an automation that triggers every hour or does this not have an impact? The workaround I use at the moment is a total reboot of the hostsystem every 6 hours but this doesn't help either as this would need to run every 2 to 3 hours...

tegner23 avatar Apr 02 '24 13:04 tegner23

The problem is not the keep alive ping the problem is that you lose the connection. You should try to make the connection stable. do you have the miniserver on a different network? If the connection is disconnected once you have to restart home assistant. That is how it is at the moment.

JoDehli avatar Apr 02 '24 16:04 JoDehli

Are there any news regarding this topic? I have the exact same problem. The linked problem in the core above states, that it should be fixed in 2024.4. I have 2024.4.3 and the error is still there. Is there maybe a chance to have a switch to manually reconnect without rebooting? This could be used in a automation.

Some details about my system.

Gen 1 Same network Pi 5 Hassio

MephistoJB avatar Apr 15 '24 12:04 MephistoJB

@MephistoJB this error accours only if you connection is unstable. I think with the current implementation it will not be possible to improve that.

JoDehli avatar Apr 15 '24 12:04 JoDehli

Hi @JoDehli thx for the reply. I have difficulties to understand what "unstable" means. HA and Loxone are in the same network and both have wired access. Do you have an idea what could cause such an instability? Also this problem started just a few weeks ago. I am lost here 😅

MephistoJB avatar Apr 15 '24 13:04 MephistoJB

@MephistoJB I mean the websocket connection is interrupted for a small amount of time. If this happened normally the complete connection routine should be start again. This is not the case how it is implemented at the moment. I think i will not implement it because I do not have the problem and the most important reason I have not time. Sorry.

I do not exactly know why this problems are more often after the last update if homeassistant. Maybe because they changed the version of the websocket library. Ok

JoDehli avatar Apr 15 '24 13:04 JoDehli

I've been having this issue for months (maybe more than a year). It always without exception occurs when there is a power (and maybe network) outage causing Miniserver to restart. It works fine after HA restart.

Elijen avatar Apr 17 '24 18:04 Elijen

Yes this is a problem and there is at the moment no solution for this. The websocket connection does not report a connection lost so there is no way to reconnect.

JoDehli avatar Apr 17 '24 18:04 JoDehli

@JoDehli Shouldn't I be able to catch the Exception that is logged by HA?

Elijen avatar Apr 17 '24 18:04 Elijen

For me Happens Independent of a power outage. It just happens every few hours.

But I found a possible workaround. I created a dummyswitch in Loxone and an automation in HA to switch a dummyswitch every minute. This seems to be stable since 48h for now. I will now try to reduce the automation trigger times and observe what happens.

MephistoJB avatar Apr 18 '24 10:04 MephistoJB

For me the workaround with total reboot of the hostsystem every 4 hours works fine. In my case I do not need any further steps within Loxone Config. If the intervall is longer, there is a risk that the failure happens again.

@MephistoJB, Please let us know as soon as you have new information

tegner23 avatar Apr 18 '24 13:04 tegner23

@tegner23 @MephistoJB @williamsjou

I tried to catch the error in the new 0.6.5 pre-release. For it is difficult because on my small installation I have really not connection problems. I tried to fix is so that you do not need special hacks.

In general I know that the current implementation is not perfect. I started this project for my own an as I said I have a very small installation. I tried to implement as much as I can for other users but my time is limited. Especially because it is even more time consuming if you do not have the devices in your own installation.

I work currently on a complete rewrite of this project. I have still some work to do but when it is finished it should more robust (hopefully). But until now I hope this release fixes the strange problems which appeared 1 HA release ago.

JoDehli avatar Apr 20 '24 08:04 JoDehli

@JoDehli your work is much appreciated. Thx for that. I can imagine that it is a tough job.

Thx also for the new release. I will try it right now and report.

Having said that. How can I install the new release? It doesn't show up in HACS

MephistoJB avatar Apr 20 '24 09:04 MephistoJB

I do not exactly know why this problems are more often after the last update if homeassistant. Maybe because they changed the version of the websocket library. Ok

I am having this issue ever since I use PyLoxone (~1.5 years), using Pi4, Gen 1 Miniserver, always on latest versions. My Pi4 is on WLAN, while Miniserver is on LAN. I will try to use LAN on the Pi4 as well and report, if this makes the link more stable.

CodeMartn avatar Apr 21 '24 14:04 CodeMartn

@JoDehli It seems like you did a pretty good job! :) Since updating to 0.6.4 (20.04.2024) I do not need any reboots to keep the integration alive. I have now created a few automations running every hour to check if the failure shows up again.

Thank you very much for your effort. Let me know how you would like to continue with that bug.

tegner23 avatar Apr 24 '24 19:04 tegner23

@tegner23 thanks for your feedback. It would be great if you can provide error logs if a error raises and causes a reconnect. And I am also interested how often this happens. I leave the Issue open for a little while. Maybe the other users can also provide some feedback.

I try to finish the new implementation. There I also have to implement a stable reconnect mechanism for such problems. If it is finished maybe you can test it.

JoDehli avatar Apr 27 '24 14:04 JoDehli

Sorry for the Late answer. To be honest I forgot to check ha in the last days/weeks, since there have been zero problems. This is a very good sign, isn't it?

Thanks a lot. Good job.

MephistoJB avatar Apr 27 '24 15:04 MephistoJB

I can report that the error does not occur in my setup on 0.6.6 but does occur frequently when I install 0.6.7. Running on a Mini Server 2 with SW version 14.5.12.7

I want to thank you for all the work in PyLoxone, it is a great integration.

j-nordt avatar May 05 '24 19:05 j-nordt

@j-nordt the versions are the same for the connection. Are you sure that it is not a other error?

JoDehli avatar May 06 '24 04:05 JoDehli

same here

  • Loxone Gen1 Miniserver
  • Core 2024.5.5
  • Supervisor 2024.05.1
  • Operating System 12.3
  • Frontend 20240501.1
2024-05-25 16:32:16.065 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 1301, in close_connection
    await self.transfer_data_task
  File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 974, in transfer_data
    await asyncio.shield(self._put_message_waiter)
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/config/custom_components/loxone/miniserver.py", line 183, in listen_loxone_send
    await self.api.send_websocket_command(device_uuid, value)
  File "/config/custom_components/loxone/api.py", line 373, in send_websocket_command
    await self._ws.send(command)
  File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 635, in send
    await self.ensure_open()
  File "/usr/local/lib/python3.12/site-packages/websockets/legacy/protocol.py", line 939, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1011 (internal error) keepalive ping timeout; no close frame received

feffi avatar May 25 '24 14:05 feffi

I'm still getting this error. In my understanding, the issue is that there are 2 self._ws.send(command) statements that are not enclosed by a try/catch, so any error that is thrown by the websockets lib is not catched.

https://github.com/JoDehli/PyLoxone/blob/master/custom_components/loxone/api.py#L373 https://github.com/JoDehli/PyLoxone/blob/master/custom_components/loxone/api.py#L383

@JoDehli maybe this could fix it?

try: 
  await self._ws.send(command)
except Exception as e:
  _LOGGER.error(e)

gigatexel avatar Jul 14 '24 16:07 gigatexel

I am also having this issue. I'm not sure if I have further problems because my lights never worked since the very beginning. They are all reported as turned on and I cannot change its state. Shading works though, but I get this issue where the integration stops working after some hours. Rebooting HA indeed fixes the problem, so it has to do with how HA handles connections or reports them or who knows, HA it's not exactly robust

danielo515 avatar Aug 04 '24 16:08 danielo515

@danielo515 are you sure that you are on the latest version?

JoDehli avatar Aug 04 '24 16:08 JoDehli

@danielo515 are you sure that you are on the latest version?

No, I was not. After updating to the latest version everything is working perfectly. Many thanks. Let's see if it keeps like that.

danielo515 avatar Aug 04 '24 23:08 danielo515