remote_homeassistant icon indicating copy to clipboard operation
remote_homeassistant copied to clipboard

3.2 reconnecting when switching stuff

Open jhein05 opened this issue 3 years ago • 30 comments

I upgraded to 3.2. today and it broke things for me. When I switch something on or off, the remote connection goes from connected to connecting, all relevant UI elements reload and nothing happens. When I click a second time, the entity gets switched, but the UI doesn't show the state change. And on the third click, the reconnect loop starts again and the correct states are displayed.

The log has the usual "heartbeat missed", "unable to find entity ...", "closing socket" error messages (had to recall those from memory, so they might be slightly off :)

Downgraded back to 3.1. and everything works flawlessly. Let me know what I can send to help

jhein05 avatar Mar 05 '21 05:03 jhein05

I experienced the same! Downgraded to 3.1 and now it is working good again.

elgeniskogen avatar Mar 05 '21 11:03 elgeniskogen

Ideally you should not get missed heartbeats. Once every now and then is fine because things happens. So it should not be "the usual". Without proper extractions from the logs I can't tell you anything, so it would be good if you can provide full logs to start with.

I'm not sure exactly why this would happen to start with. There's not that many changes in 3.2. If you flick a switch, on the main instance, is that reflected on the remote instance? I suspect it has something to do with subscribed events, since that is something that actually changed.

postlund avatar Mar 05 '21 11:03 postlund

Same issue here with switches not working...

jaym25 avatar Mar 05 '21 20:03 jaym25

Please try #113, I believe it will fix the issue.

postlund avatar Mar 05 '21 20:03 postlund

#113 Fixed the switch problem for me. Thank you!

jaym25 avatar Mar 06 '21 02:03 jaym25

#113 causes a new problem... I have a remote system running bi-directionally with the main system. In other words, both systems are running the HA Remote component and connecting to the other system's websocket on port 443. Everything is fine when #113 fix is running one system and the old version on the other system. The problem is that when the second system fires up with #113 fix, the processor usage on both systems skyrockets from 2-4% to 65-75%, causing serious lag and other problems on both systems.

Reverting back to 3.1 solved the problem. Actually reverting only 1 system stopped the overload on both systems.

The 2 systems are both RasPi 4s, running the latest HA on Hassio. Did much checking and am sure the processor load problem is due to the #113 fix. It must be some kind of interaction issue...

jaym25 avatar Mar 06 '21 03:03 jaym25

@jaym25 How did you set up the integration? I think this is expected behavior as we have no official support for bi-directional set ups like yours. From version 3.2, the state_changed and service_call events are always subscribed to. Earlier, this was default if you did not specify subscribed_events in YAML but not if you set up via config flow (a bug). You basically had to manually add these events to get things working properly when setting up via config flow. So, I suspect that if you subscribe to state_changed on both ends and change state on an entity, the event would be sent to the remote instance that would trigger a state change being picked up by the main instance, creating an endless loop. We need to filter events to avoid this. I would imagine that your configuration lacked subscription to these events (at least on one side) and that's why it used to work.

postlund avatar Mar 06 '21 06:03 postlund

@postlund This is my configuration. I have been doing bi-directional since the component was first released with no problems. Service call default may be the problem, but is it necessary to be default?...

My phone app is tied to the home system and I have a tab that monitors all aspects at the office and switches that can control things. I also have the system at the office disarm when I arrive, because the office system receives phone tracking info from the home system.

Very nice setup and very convenient...

On home system:

remote_homeassistant:
  instances:
  # Office HomeAssistant
  - host: !secret remote_ha_host
    port: 443
    secure: true
    verify_ssl: false
    access_token: !secret remote_ha_token
    entity_prefix: "office_"
    subscribe_events:
    # - service_registered
    - state_changed
    include:
    #   domains:
    #   - binary_sensor
      entities:
      - automation.security_front_door_open
      - automation.security_back_door_open
      - automation.security_carport_gate_open
      - automation.security_storage_door_open
      - automation.security_walking_gate_open
      - automation.security_vehicle_gate_open
      - automation.security_shed_door_open
      - automation.security_mailbox_open
      - automation.security_screen_door_open
      - automation.security_dog_door_open
      - automation.security_carport_motion
      - automation.security_back_yard_motion
      - automation.security_living_room_motion
      - automation.security_delivery_box_open
      
      - binary_sensor.presence
      - binary_sensor.back_yard_motion
      - binary_sensor.back_yard_tamper
      - binary_sensor.carport_motion
      - binary_sensor.carport_tamper
      - binary_sensor.shed_tamper
      - binary_sensor.system_power_fail
      - binary_sensor.updater
      - binary_sensor.wyzesense_779c1191  # Front Door
      - binary_sensor.wyzesense_779c1192  # Screen Door
      - binary_sensor.wyzesense_778c7853  # Dog Door
      - binary_sensor.wyzesense_778bdeed  # Back Door
      - binary_sensor.wyzesense_778bdf15  # Storage Door
      - binary_sensor.wyzesense_778c7975  # Carport Gate
      - binary_sensor.wyzesense_77855ee2  # Walking Gate
      - binary_sensor.wyzesense_778565d7  # Vehicle Gate
      - binary_sensor.wyzesense_778ba7f0  # Shed Door
      - binary_sensor.wyzesense_7786e169  # Delivery Box
      - binary_sensor.wyzesense_778468cd  # Mailbox
      - binary_sensor.wyzesense_779d6cb4  # Living Room Motion
      - binary_sensor.wyzesense_77855b88_moisture   # Attic Water

      - input_boolean.guest_mode
      - input_boolean.jay_email
      - input_boolean.neo_siren
      - input_boolean.system_loading
      
      - person.jay

      - sensor.ac_power
      - sensor.cpu_temperature
      - sensor.memory_use_percent
      - sensor.processor_use
      - sensor.swap_use_percent
      - sensor.version_update
      - sensor.shenzhen_neo_electronics_co_ltd_siren_alarm_battery_level
      - sensor.ups_battery_level
      - sensor.wyzesense_779c1191_battery_level  # Front Door
      - sensor.wyzesense_779c1192_battery_level  # Screen Door
      - sensor.wyzesense_778c7853_battery_level  # Dog Door
      - sensor.wyzesense_778bdeed_battery_level  # Back Door
      - sensor.wyzesense_778bdf15_battery_level  # Storage Door
      - sensor.wyzesense_778c7975_battery_level  # Carport Gate
      - sensor.wyzesense_77855ee2_battery_level  # Walking Gate
      - sensor.wyzesense_778565d7_battery_level  # Vehicle Gate
      - sensor.wyzesense_778ba7f0_battery_level  # Shed Door
      - sensor.wyzesense_7786e169_battery_level  # Delivery Box
      - sensor.wyzesense_778468cd_battery_level  # Mailbox
      - sensor.wyzesense_77855b88_battery_level  # Attic Water
      - sensor.wyzesense_779d6cb4_battery_level  # Living Room Motion
      - sensor.zooz_zse40_4_in_1_sensor_battery_level    # Carport Multi
      - sensor.zooz_zse40_4_in_1_sensor_battery_level_2  # Back Yard Multi
      - sensor.zooz_zse40_4_in_1_sensor_battery_level_3  # Shed Multi
      - sensor.zooz_zse40_4_in_1_sensor_luminance
      - sensor.zooz_zse40_4_in_1_sensor_luminance_2
      - sensor.zooz_zse40_4_in_1_sensor_luminance_3
      - sensor.zooz_zse40_4_in_1_sensor_relative_humidity
      - sensor.zooz_zse40_4_in_1_sensor_relative_humidity_2
      - sensor.zooz_zse40_4_in_1_sensor_relative_humidity_3
      - sensor.zooz_zse40_4_in_1_sensor_temperature
      - sensor.zooz_zse40_4_in_1_sensor_temperature_2
      - sensor.zooz_zse40_4_in_1_sensor_temperature_3
      
      - switch.shenzhen_neo_electronics_co_ltd_power_plug_12a_switch
    # exclude:
    #   domains:
    #   entities:

On office system:

remote_homeassistant:
  instances:
  # Home HomeAssistant
  - host: !secret remote_ha_host
    port: 443
    secure: true
    verify_ssl: false
    access_token: !secret remote_ha_token
    entity_prefix: "home_"
    subscribe_events:
    - state_changed
    include:
    #   domains:
    #   - binary_sensor
      entities:
      - device_tracker.ariela_sm_n970u1
    # exclude:
    #   domains:
    #   entities:

jaym25 avatar Mar 06 '21 07:03 jaym25

Right, my bad, I meant service_registered and not service_call. Not much changed in 3.2 (core parts), so it must be related to subscribed events. If you run 3.1 and add service_registered on both instances, do you see the same behavior?

postlund avatar Mar 06 '21 10:03 postlund

Yes. That is exactly what happens. In addition, when I am looking at the HA GUI on my Firefox browser, it makes my computer start working really hard and the fan speeds up and the browser slows to a crawl.

jaym25 avatar Mar 06 '21 15:03 jaym25

I've looked at the code trying to figure out what's going on, but I can't really see it. We don't rely on service_registered internally and I can find any subscriptions to it in core (it is only generated when a new service is registered), so I don't see any reason why we would register to it. But I don't see any reason why it would hurt either.

@lukas-hetzenecker Do you have any insights to why this event was part of the default events in the first place? Or why subscribing to it would generate an infinite ping-pong loop back and forth?

postlund avatar Mar 06 '21 18:03 postlund

@postlund it's definitely a bad loop... Should be easy to duplicate by setting up a similar arrangement on 2 lan machines or instances... even with just 1 subscribed sensor on each.

If it doesn't duplicate, it could be the ssl causing it.

Also, never needed service_registered and I have several sensors and switches and automations being monitored. And it's fast and smooth both ways, even over WAN with SSL.

The commented service_registered may be from when I set it up a couple of years ago, and I had negative effects and tried without it. Been that way ever since.

jaym25 avatar Mar 06 '21 19:03 jaym25

@postlund On ver 3.3, I changed init.py line 147 from INTERNALLY_USED_EVENTS = [EVENT_STATE_CHANGED, EVENT_SERVICE_REGISTERED] to INTERNALLY_USED_EVENTS = [EVENT_STATE_CHANGED] and it works fine for me.

I cannot be sure this will work in general because I am using YAML and not config flow. Hope this helps.

jaym25 avatar Mar 07 '21 16:03 jaym25

@jaym25 That would be the correct change to remedy the situation. I would however like to know why this happens in the first place and if that event is even needed. I'll see if I can set up two instances and reproduce when I have some time.

postlund avatar Mar 08 '21 11:03 postlund

@postlund I did some further testing with EVENT_SERVICE_REGISTERED removed, and this is what I found: I removed the integration from the office. The one at the house worked fine 1-way. I removed the integration from the house. The one at the office worked fine 1-way. I set up the new office integration with config flow (no YAML) and it works fine, 1-way and bi-directional. I still have the home integration on YAML, and the entire setup works fine bi-directional. I will try to set up the home integration with config-flow soon.

What I can see from this testing is that you should be able to safely remove EVENT_SERVICE_REGISTERED from future releases with little or no problem. Worst case, the user could add it on page 4 of the config flow setup (or in YAML) if some entities aren't responding. Just my thoughts. p.s. I checked the logs... no new warnings or errors

jaym25 avatar Mar 08 '21 16:03 jaym25

@postlund I have a question... Do you save any significant network traffic or resources by limiting the number of entities that are transferred from the client system when using this integration?

jaym25 avatar Mar 08 '21 17:03 jaym25

The problem more has to do with what functionality we will miss if the event isn't listened to. It is fired whenever a new service is registered, so I would imagine that it at least would be used to register new services in the main instance when a new service is registered on the remote instance. This does however not seem to be implemented (and would need special care to handle). I don't think it's very common to register new services dynamically (other than when loading a new component), but I know that for instance esphome does it and we don't support that. It can however be solved with a proxy service. I might be missing something, it should however be safe to remove for now.

postlund avatar Mar 08 '21 17:03 postlund

@postlund I have a question... Do you save any significant network traffic or resources by limiting the number of entities that are transferred from the client system when using this integration?

We get the state of an entity via the state_changed event and it is not possible to limit which entities we get these events from in the API. We get all state changes and just ignore the ones we are not interested in. So the answer is unfortunately no.

postlund avatar Mar 08 '21 17:03 postlund

That actually makes it easier for me to change my home system to config flow, since all state changes are already being transferred and I'm not seeing any resource (processor 2-5%) or network traffic problems. Thank you for your quick response.

jaym25 avatar Mar 08 '21 17:03 jaym25

However, for purposes of reducing entity clutter of unneeded entities, accepting wildcards in the include and exclude options would be cool... just saying.

jaym25 avatar Mar 08 '21 18:03 jaym25

@jhein05 Sure thing! 👍

postlund avatar Mar 08 '21 18:03 postlund

cool! lol

jaym25 avatar Mar 08 '21 18:03 jaym25

That's totally possible, but I would probably go with a regexp in that case as dealing with lists of wildcard inputs like that isn't possible in the config flow input entry (at least not in a decently good manner). I don't like accepting a string with multiple values, e.g. light.foo_*,switch.bar_*, that kind of separation should be handled on a higher level.

postlund avatar Mar 08 '21 18:03 postlund

Anything would be a plus, as you can see above, I have a lot of wyzesense and zooz sensors that could be entered with one line... and new ones would automatically be included...

jaym25 avatar Mar 08 '21 18:03 jaym25

but even then, the zooz would have to be filtered more since z-wave creates so many unneeded entities...

jaym25 avatar Mar 08 '21 18:03 jaym25

Yep, the "auto" support is something that I would like as well as that is pretty cool.

postlund avatar Mar 08 '21 18:03 postlund

Awesome, I'll expect that in v3.4 by Friday! LOL

jaym25 avatar Mar 08 '21 18:03 jaym25

@postlund in all seriousness, thank you for taking over this component and the excellent work you've done in organizing, improving and bringing this most useful integration up to date!

jaym25 avatar Mar 09 '21 13:03 jaym25

Thanks @jaym25, appreciate to hear that! 👍

postlund avatar Mar 09 '21 21:03 postlund

That's totally possible, but I would probably go with a regexp in that case as dealing with lists of wildcard inputs like that isn't possible in the config flow input entry (at least not in a decently good manner). I don't like accepting a string with multiple values, e.g. light.foo_*,switch.bar_*, that kind of separation should be handled on a higher level.

Any updates on wildcards ? :)

JonathanTreffler avatar Mar 01 '22 23:03 JonathanTreffler