core icon indicating copy to clipboard operation
core copied to clipboard

TPlink Kasa devices constantly go offline in Home Assistant

Open AV0uu opened this issue 1 year ago • 1 comments

The problem

TPlink Kasa wifi switches (HS200, HS210, KS200), In-wall Outlets (KP200), and power strips (KP303) will show offline in Home Assistant despite working in the Kasa App. These devices will go online and offline seemingly at random.

What version of Home Assistant Core has the issue?

core-2024.10.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

TP-Link Smart Home

Link to integration documentation on our website

(https://www.home-assistant.io/integrations/tplink)

Diagnostics information

home-assistant_tplink_2024-10-09T16-52-23.108Z.log

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

AV0uu avatar Oct 09 '24 16:10 AV0uu

Hey there @rytilahti, @bdraco, @sdb9696, mind taking a look at this issue as it has been labeled with an integration (tplink) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of tplink can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign tplink Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


tplink documentation tplink source (message by IssueLinks)

home-assistant[bot] avatar Oct 09 '24 18:10 home-assistant[bot]

I have Kasa devices (HS100, HS110) and they have been offline for almost a year. I gave up on this integration, tbh.

Gherry777 avatar Oct 17 '24 11:10 Gherry777

I have 60 devices. 8 of them have just started going offline (still good in the app) consistently, whereas the other are still fine. They are also all near to each other in the house, strangely enough. If i reset the switch with the button, they reconnect and are good for a day or two. It's always the same 8 devices.

rtech73 avatar Oct 18 '24 15:10 rtech73

Just to be clear: my 5 smart plugs work perfectly with the app or via Amazon Alexa. There are zero problems, always online and my network is perfect (all devices with static IP addresses).

It's just this integration that always have problems, I ended up removing it from HA. I have 80 other devices (plugs/lights/sensors/valves) with different protocols (wifi/zigbee/bluetooth) and the only problem is with TP-Link HS100/110. All the time.

They drop off the network (only in HA, because they're there, obviously) and the system keeps trying to reconnect every 5 min, failing. Since it's been almost 8 months since this problem started, I think either this integration isn't maintained anymore, or nobody cares about Kasa plugs.

Gherry777 avatar Oct 18 '24 15:10 Gherry777

Thank you for your perspectives, I thought I was going nuts with the trouble I was having.

To do a little independent investigation (maybe it was a hardware issue?) i wrote a python script using python-kasa to:

  1. Discover all my Kasa devices to see if the number expected matched the discovered value;
  2. Get a 'feature' list from the devices; and
  3. Attempt to force a 'reboot' via python, since manually pressing the reboot button on the switches will clear up the connection issues for a day or so (like rtech73 stated).

While all of the devices respond in the Kasa and Tapo apps on the iPhone (IOS 18), the python script will throw errors not being able to 'device.update()' and 'device.reboot()' the IPs of the kasa plugs and switches that show offline in HA.

TLDR: Is it a python-kasa issue?

AV0uu avatar Oct 18 '24 16:10 AV0uu

Hi @AV0uu. From looking through your logs it seems that at the time when your tplink device are going offline you also have multiple other device types going offline with different integrations. It appears there are either general network issues happening at the same time or network issues specifically with the device running HA.

There are a few reasons why the kasa and tapo apps appear to be working ok. HA connects locally to the devices, whereas if the devices have access to the internet the native apps tend to connect via the cloud. It could be that the HA box is experiencing the issues and it is not affecting the devices going directly through your router to the cloud. Also the tplink integration reports unavailable as soon as the device becomes unavailable (within 5 seconds), whereas the native apps generally don't tell you when they can't connect until a lot longer.

sdb9696 avatar Oct 18 '24 17:10 sdb9696

@Gherry777 please open a new issue and include some debug logs if you want assistance. This integration is well maintained and we are happy to help with issues when there are constructive contributions.

@rtech73 we have had some issues reported where they have turned out to be problems with certain access points on mesh networks. Tweaking the wireless protocols has been reported as sometimes fixing these issues.

sdb9696 avatar Oct 18 '24 17:10 sdb9696

I have 11 kasa KL125 bulbs and since 2024.10.2 they've been going offline in homeassistant. In the app they are always online and consistently immediately responsive. The fact that the app works instantly on all bulbs that show offline in home assistant tells me that this is definitely a home assistant thing.

One person in another thread mentioned changing IPs in dhcp, but I've since migrated all to static IPs.

This is what I see in home assistant. Again, all of these have solid signal to the access point, and are available in the kasa app and instantly respond to changes from kasa. image Some of the behavior I see when using the homeassistant is:

When turning on or off a group of 4 bulbs I see the following behavior

Not all will toggle usually 2 or 3 out of 4 One may blink at full brightness for about a tenth of a second every 3 to 4 seconds Sometimes they come on a different brightnesses

Code for my toggle call

metadata: {}
data:
  brightness_pct: 100
target:
  area_id: office
  entity_id: light.office_lights
action: light.toggle

light.office_lights is a group entity containing office light 1-4

Adding log entries - I see a bunch of this over and over with all of my bulbs:

2024-10-18 12:13:17.806 ERROR (MainThread) [homeassistant.components.tplink.coordinator] Error fetching 10.1.11.19 data: Unable to query the device 10.1.11.19:9999:
2024-10-18 12:13:17.807 WARNING (MainThread) [homeassistant.components.group.sensor] Unable to use state. Only numerical states are supported, entity sensor.office_light_1_current_consumption with value unavailable excluded from calculation in sensor.lights_current_energy_usage
2024-10-18 12:13:17.807 WARNING (MainThread) [homeassistant.components.group.sensor] Unable to use state. Only numerical states are supported, entity sensor.office_light_1_today_s_consumption with value unavailable excluded from calculation in sensor.lights_today_s_usage
2024-10-18 12:13:18.005 ERROR (MainThread) [homeassistant.components.tplink.coordinator] Error fetching 10.1.11.10 data: Unable to query the device 10.1.11.10:9999:
2024-10-18 12:13:18.007 WARNING (MainThread) [homeassistant.components.group.sensor] Unable to use state. Only numerical states are supported, entity sensor.office_light_4_current_consumption with value unavailable excluded from calculation in sensor.lights_current_energy_usage
2024-10-18 12:13:18.007 WARNING (MainThread) [homeassistant.components.group.sensor] Unable to use state. Only numerical states are supported, entity sensor.office_light_4_today_s_consumption with value unavailable excluded from calculation in sensor.lights_today_s_usage
2024-10-18 12:13:27.946 ERROR (MainThread) [homeassistant.components.tplink.coordinator] Error fetching 10.1.11.11 data: Unable to query the device 10.1.11.11:9999:
2024-10-18 12:13:27.947 WARNING (MainThread) [homeassistant.components.group.sensor] Unable to use state. Only numerical states are supported, entity sensor.bedroom_light_2_current_consumption with value unavailable excluded from calculation in sensor.lights_current_energy_usage
2024-10-18 12:13:27.947 WARNING (MainThread) [homeassistant.components.group.sensor] Unable to use state. Only numerical states are supported, entity sensor.bedroom_light_2_today_s_consumption with value unavailable excluded from calculation in sensor.lights_today_s_usage

welborn avatar Oct 18 '24 17:10 welborn

Hi @AV0uu. From looking through your logs it seems that at the time when your tplink device are going offline you also have multiple other device types going offline with different integrations. It appears there are either general network issues happening at the same time or network issues specifically with the device running HA.

There are a few reasons why the kasa and tapo apps appear to be working ok. HA connects locally to the devices, whereas if the devices have access to the internet the native apps tend to connect via the cloud. It could be that the HA box is experiencing the issues and it is not affecting the devices going directly through your router to the cloud. Also the tplink integration reports unavailable as soon as the device becomes unavailable (within 5 seconds), whereas the native apps generally don't tell you when they can't connect until a lot longer.

Thank you for taking the time to explain. I understand that the the apps would be slower to show unavailiblity, perhaps the log i sent is not telling the whole story. The devices stay offline for hours or even a day in HA while i am able to operate them in the apps. For instance, right at this moment my 'Fireplace Top' and 'Fireplace Bottom' entities of the 'Fireplace Outlet" all show offline in HA, and have for hours, but i can operate the outlet via Kasa app.

AV0uu avatar Oct 18 '24 17:10 AV0uu

@welborn please open a new issue and include some debug logs.

sdb9696 avatar Oct 18 '24 17:10 sdb9696

Thank you for taking the time to explain. I understand that the the apps would be slower to show unavailiblity, perhaps the log i sent is not telling the whole story. The devices stay offline for hours or even a day in HA while i am able to operate them in the apps. For instance, right at this moment my 'Fireplace Top' and 'Fireplace Bottom' entities of the 'Fireplace Outlet" all show offline in HA, and have for hours, but i can operate the outlet via Kasa app.

Yes but as I said there are many devices in your HA instances reporting as unavailable. You could enable debug logs for kasa which would give us more detail, but I also think you should try to figure out whether any of your custom integrations are periodically hosing your HA instance.

sdb9696 avatar Oct 18 '24 18:10 sdb9696

Thank you for taking the time to explain. I understand that the the apps would be slower to show unavailiblity, perhaps the log i sent is not telling the whole story. The devices stay offline for hours or even a day in HA while i am able to operate them in the apps. For instance, right at this moment my 'Fireplace Top' and 'Fireplace Bottom' entities of the 'Fireplace Outlet" all show offline in HA, and have for hours, but i can operate the outlet via Kasa app.

Yes but as I said there are many devices in your HA instances reporting as unavailable. You could enable debug logs for kasa which would give us more detail, but I also think you should try to figure out whether any of your custom integrations are periodically hosing your HA instance.

Will do!

AV0uu avatar Oct 18 '24 18:10 AV0uu

home-assistant_tplink_2024-10-18T18-21-36.576Z.log

Debug log attached, (i had to cut it down to fit the upload size)

AV0uu avatar Oct 18 '24 18:10 AV0uu

Were the devices unavailable during this logging? I don't see any tplink errors.

sdb9696 avatar Oct 18 '24 18:10 sdb9696

Some kasa devices were available, others were unavailable. I just went through and deleted and disabled a bunch of HACs integrations to see if that would help, but it appears to have no effect.

AV0uu avatar Oct 18 '24 18:10 AV0uu

home-assistant_tplink_2024-10-18T18-50-52.501Z.log

Four devices were offline during this debug log period. (all the KP200 Outlets which seem to suffer the problem more than the switches)

AV0uu avatar Oct 18 '24 18:10 AV0uu

At the risk of shouting into the dark here; I am following-up after a couple weeks after attempting to mitigate some of the issues for anyone doing a search for the same problem in the future (https://xkcd.com/979/).

I was able to improve device availability in HA by:

  1. Assigning a static IP address to each device.
  2. Removing the Single Pole switches and replacing with Z-wave (fewer devices on wifi).
  3. Removing most of the HACS and Add-On integrations that I absolutely did not need.
  4. Writing a python script to send a 'reboot' to the kasa devices on my network that I run a couple times per week.

I would estimate that I am now seeing 80% to 90% of the Kasa devices in HA fairly consistently, but rarely 100% all at once. The worst offenders are still the in-wall outlets (KP200s) as well as some of the older 3-way switches (which are the only ones that have 2-traveler architecture).

One challenge was the feature of the TP-link mesh router (XE75) to identify and control the kasa switches, I think this was part of the issue and made assigning a static IP overly difficult (i had to manually type in the MACs).

A reboot device button in HA would be very welcome!

Thank you all for taking the time to advise me about this issue, especially sdb9696.

TLDR: There was definitely something to the assertion that there may be network issues, but I still think something else is also causing issues.

AV0uu avatar Nov 04 '24 18:11 AV0uu

Thanks for your insights, @AV0uu! So there are many variables, which also differ between device families, so finding out the exact cause is rather complicated as you have noticed.

I feel that one of the most common issue is related to too strict network configuration combined with device address changes, so I'm going to describe its workings a bit. In order to update the config entry on address changes, the integration leverages L2 connectivity:

  • UDP broadcasts that are sent out on the network interfaces configured in homeassistant, and
  • DHCP communications which require that there is a matching entry in the manifest file for the used mac address ~~(and notably for it's host name, too!)~~

Both of these require that the homeassistant instance is running in the same network, or at least has a direct access and is configured to use the separate network adapters. The documentation could be improved to clarify this, so PRs are welcome!

Now, while I was writing this comment, I started to wonder if our hostname-based filtering might be a cause for some of the woes? Perhaps worth investigating, if it'd be fine to move away from hostname-based matching and perform a connectivity check for all known mac address prefixes. This would also relieve us, the maintainers, from trying to keep the list up-to-date.

In a perhaps relevant note on how you improved the availability, @sdb9696 noticed that some devices throttle discovery requests (https://github.com/python-kasa/python-kasa/pull/1207), so disabling other tapo/kasa integrations that send out requests might indeed help to alleviate the issue.

P.S. Your wish for a 'reboot' button has been answered, and there is now one you can enable in the 2024.11 release (see #127935) :-)

edit: I was correct OOB, that the hostname matching applies only to the initial discovery, as registered_devices in the manifest skips the hostname check for already known devices. Whether we should be less strict on the checks in general remains undecided.

rytilahti avatar Nov 04 '24 23:11 rytilahti

Wow, thank you @rytilahti , for your additional explanation. I have upgraded to 2024.11 and enabled that reboot for all my kasa devices! You guys rock!

AV0uu avatar Nov 08 '24 22:11 AV0uu

Closing this issue as network related and (mostly I think) resolved

sdb9696 avatar Jan 10 '25 11:01 sdb9696