ha-edgeos
ha-edgeos copied to clipboard
daily disconnects
I get daily disconnects , reload integration fixes the issue.
running on v2.0.12
This error originated from a custom integration.
Logger: custom_components.edgeos.component.api.websocket
Source: custom_components/edgeos/component/api/websocket.py:133
Integration: Ubiquiti EdgeOS Routers (documentation, issues)
First occurred: October 18, 2022 at 19:21:02 (1 occurrences)
Last logged: October 18, 2022 at 19:21:02
Failed to connect WS, Error: 503, message='Invalid response status', url=URL('wss://192.168.3.1/ws/stats'), Line: 109
October 19, 2022
ROUTER Interface ETH0 Status was connected
08:46:50 - 14 minutes ago
ROUTER Interface ETH0 Status was connected
08:34:09 - 27 minutes ago
ROUTER Interface ETH0 Status became unavailable
08:33:59 - 27 minutes ago
October 18, 2022
ROUTER Interface ETH0 Status was connected
07:21:45 - Yesterday
ROUTER Interface ETH0 Status became unavailable
07:21:35 - Yesterday
October 17, 2022
ROUTER Interface ETH0 Status was connected
07:55:17 - 2 days ago
at the same time, ping integration to google dns keeps running during the events.
I'll add debug as well to see what causes this.
this happens due to service disconnections by the router, the component connectivity recovers from it after 30 seconds, in old versions of component, that warning message was not written unless it failed 3 times in row, now it logs every time it happens.
it also happened for me twice over the last week, again, this is not related to the component (can happen also with the UI of EdgeOS, but again, it recovers from it as well)
btw, you don't need to reload as this is part of the recovery mechanism
everythings stays disconnected, until I reload. for instance eth0 graph stays at 0 for hours. so this is not the only sy,mptom of the issue I face.
data:image/s3,"s3://crabby-images/601d6/601d6bcc3113732d5ee635b9139816800f4eee78" alt="image"
please provide debug log level for that, I will need to understand why it's not recovering (without additional warnings)
I will, once it happens again, so I expect tomorrow.
thanks
@poonab mentioned there is a log message of Update in progress, will skip the request
which can cause it, please let me know if it happens to you as well, it can be a great lead
@poonab, are there any log messages with the following pattern before / after it happens (under the assumption log level is set to INFO / DEBUG): WS Status changed to ***, API Status: ***?
released new version with improved connectivity, set the message to e displayed only when the connection was not created, in case of disconnection, based on the error, in addition, fixed the ping message which could of cause it since it was not working.
data:image/s3,"s3://crabby-images/aaa93/aaa93e82b581363646880c99dea5c8ae180713ea" alt="image"
some logs: https://pastebin.com/1AAKNK1Y
struggling to paste bigger log, as debug creates a TON :-)
the log you have sent is from 07:43:02, i cannot understand anything with log of one second :) please upload the entire log begins before it stopped working
in addition, I added 3 new sensors of messages: received, ignored and error, can you please check the received messages sensor's graph?
thanks
log file:
https://wetransfer.com/downloads/24ee030959579cfb47bb74f7263d9ae220221020061726/7d89821307cdde029044ff7737bd339820221020061758/b7b95a
I don't have those new sensors yet, as I still ran on 0.12 when the issue occured. I will update to it now.
I will try getting to it by evening, with the latest version (v2.0.13) - enough to set the logs for info and above to understnad the issue
Happened again last night at around 0:00, currently on 2.0.14 home-assistant.log.gz
Is it happening at the same time every night?
the last 2 nights, yes, at midnight, and the night before at 19:20
Mine also failed again @ 21:54:40, not happening at the same time. Errors on HA are
This error originated from a custom integration.
Logger: homeassistant
Source: custom_components/edgeos/component/api/websocket.py:158
Integration: Ubiquiti EdgeOS Routers (documentation, issues)
First occurred: 22 October 2022 at 21:54:40 (1101 occurrences)
Last logged: 13:11:36
Error doing job: Task exception was never retrieved
Traceback (most recent call last):
File "/config/custom_components/edgeos/component/managers/home_assistant.py", line 83, in async_send_heartbeat
await self.ws.async_send_heartbeat()
File "/config/custom_components/edgeos/component/api/websocket.py", line 158, in async_send_heartbeat
await self._ws.send_str(data)
File "/usr/local/lib/python3.10/site-packages/aiohttp/client_ws.py", line 151, in send_str
await self._writer.send(data, binary=False, compress=compress)
File "/usr/local/lib/python3.10/site-packages/aiohttp/http_websocket.py", line 690, in send
await self._send_frame(message, WSMsgType.TEXT, compress)
File "/usr/local/lib/python3.10/site-packages/aiohttp/http_websocket.py", line 601, in _send_frame
raise ConnectionResetError("Cannot write to closing transport")
ConnectionResetError: Cannot write to closing transport
This error originated from a custom integration.
Logger: custom_components.edgeos.core.managers.home_assistant
Source: custom_components/edgeos/core/managers/home_assistant.py:244
Integration: Ubiquiti EdgeOS Routers (documentation, issues)
First occurred: 20 October 2022 at 04:12:41 (55 occurrences)
Last logged: 01:17:16
Update in progress, will skip the request
Skipping through the logs I cant see anything weird included but at 21:54 all the data that continues to be logged going forward is no longer updating the values
That is not the same error is reported in this issue, another issue, saw it as well and this one already has a fix
please check v2.0.15, should improve dramtically the stability, for me 3 days without a single warning / error
thanks
I upgraded to v2.0.16 yesterday, happened again last night. INFO level log doesn't show much, just that after 1s. past midnight no values change anymore. 192.168.3.9-20221029-095831.log
According to the log you have sent, there are updates up until the last log line: 2022-10-29 00:00:03.101 INFO (SyncWorker_30) [custom_components.edgeos.core.managers.entity_manager] ROUTER Interface ETH0 Received Traffic (sensor) updated, state: 527423484.475 | updated
in addition, i don't see a single disconnection within that log, please use debug log level to get more details for the same period, the fact it happens on dialy basis around the same time is weird, happens to me in the past because lots of load on the DNS server I ran (pi-hole), could it be that you are using HASS.IO / AdGuard / PiHole?
thanks
I enabled debug yesterday, included the log from 23:58-00:02 192.168.3.9-20221030-102935.zip
I run pi-hole but not on HA, it's on a seperate VM.
Will review the log file later today, thanks.
Additional direction, you might have the same issue I had around a year ago when the pihole was overloaded at specific time which caused many unexplained issues related to networking, Please check how to increase the number of requests in parallel to the pihole.
I added some monitoring to esxi yesterday, it seems it's not busy on the cpu's at the time:
also not on HA:
though it is currious to see that it drop cpu cycles right at that time...
is the second screen shot is from the component of EdgeOS CPU?
thanks
no, overal HA processor usage. I'm putting an snmp probe in for the edgeos as we speak. and I'm automating the reloading for now at 00:30.
Can you share the screen shot of the EdgeOS CPU / Memory sensors?
Thanks