monitor_docker icon indicating copy to clipboard operation
monitor_docker copied to clipboard

Remote host not reconnecting after power off

Open Londoneye02 opened this issue 3 years ago • 8 comments

I have a installation made on a raspberry which is monitoring both the raspberry itself and also a remote host running Debian

Bellow is the configuration

- name: Docker_Raspberry
  containers:
     - pihole
     - nginx_manager
     - duplicati
     - grafana
     - homer
  rename:
    pihole: PiHole
    nginx_manager: Nginx Proxy Manager
    duplicati: Duplicati
    grafana: Grafana
    homer: Homer

  monitored_conditions:
    - version
    - containers_running
    - containers_total
    - state
    - status
    - memory
    - network_total_up


- name: Docker_DebianServer
  url: tcp://192.168.1.117:2375
  containers:
     - jellyfin 
  rename:
    jellyfin: Jellyfin

  monitored_conditions:
   - version
   - containers_running
   - containers_total
   - state
   - status
   - memory
   - network_total_up
   - network_speed_up
   - network_speed_down

It works fine if I start the raspberry after the Debian Server. But if I turn off the remote server, It does not receive any data from It when it comes back live.

Screenshot_20220615-182042-183

Is it the normal behaviour, or do I have anything wrong on the configuration?? Most of the times, it does not work. In a very few occasion, it works

Thank you for your time

Londoneye02 avatar Jun 15 '22 16:06 Londoneye02

I will test this soon, I am not also not sure what the behavior should be (I dependent on the underlying library).

ualex73 avatar May 23 '24 18:05 ualex73

Same on my system. When the remote docker_proxy restarts, the integration will lost the connection and I have to restart Home Assistant to renable ist:

2024-07-06 10:19:15.279 ERROR (MainThread) [custom_components.monitor_docker.helpers] [xxxxx] bitwarden: Container not available anymore (3a) (DockerError(900, 'Cannot connect to Docker Engine via tcp://xxxxx.lan:2375 [Server disconnected]'))
2024-07-06 10:19:15.280 ERROR (MainThread) [custom_components.monitor_docker.helpers] [xxxxx] dockerproxy_xxxxx: Container not available anymore (3a) (DockerError(900, 'Cannot connect to Docker Engine via tcp://xxxxx.lan:2375 [Server disconnected]'))
2024-07-06 10:19:18.149 ERROR (MainThread) [custom_components.monitor_docker.helpers] [xxxxx]: run_docker_events loop ended
2024-07-06 10:20:21.219 ERROR (MainThread) [custom_components.monitor_docker.helpers] [xxxxx]: Trying to get a not existing container adminer
2024-07-06 10:20:21.219 ERROR (MainThread) [custom_components.monitor_docker.switch] Service restart failed, container 'adminer'does not exist 

RK62 avatar Jul 06 '24 08:07 RK62

@RK62 this is also with v1.19?

ualex73 avatar Jul 06 '24 12:07 ualex73

@ualex73 Yes, I installed & try it today with...

  • Custom Monitor Docker component for Home Assistant v1.19
  • ghcr.io/tecnativa/docker-socket-proxy:0.1.2
  • HA-Core 2024.7.1

RK62 avatar Jul 06 '24 14:07 RK62

I can confirm this. The monitor_docker component never attempts to reconnect to a remote host if it loses the connection.

Is there any way to restart the component without restarting home assistant?

nobodyspecial avatar Jul 19 '24 15:07 nobodyspecial

I'm seeing this as well. If it loses connection it can't seem to re-establish, even though it is running. Restart of HA fixes it temporarily, and then I see errors again and lose current status.

sstratoti avatar Jul 26 '24 03:07 sstratoti

I am aware of the issue, still investigating how we can make it work (the problem is a bit in the underlying library I use)

ualex73 avatar Jul 27 '24 16:07 ualex73

Sounds like you're on it, but I ust wanted to add - tried increasing my timeout through haproxy.cfg file mapped into tecnativa/docker-socket-proxy. Increased the timeouts to 1m with the same issue.

Thank you for looking into this!

Also I think tecnativa/docker-socket-proxy recently (last few weeks) updated to haproxy 3.0 in their main branch? Not sure if that matters either.

sstratoti avatar Jul 27 '24 20:07 sstratoti

Can you try the 1.20b0 (pre-release)? It should resolve this.

BTW, entities still disappear when the host is rebooted ... but they should come back after the host is online.

ualex73 avatar Jan 11 '25 20:01 ualex73

I tried this having been following this thread.

Good news: it seems to reconnect! I rebooted and after the container was up for idk, 30 seconds or so it reconnected.

Bad news: The host machine where HA and other containers live no longer works. I'm not sure if there was a breaking change but I didnt change my HA docker-compose nor my monitor_docker config, and it worked great before.

Here's some snippets if it helps:

version: '3.8'
services:
    homeassistant:
        ports:
          - '8123:8123'
        network_mode: host
        container_name: homeassistant
        image: "ghcr.io/home-assistant/home-assistant:stable"
        environment:
          - TZ=America/Chicago
        volumes:
          ...
          # Required for Monitor Docker integration
          - /var/run/docker.sock:/var/run/docker.sock
          ...
        # Maps the Zigbee controller
        restart: unless-stopped
        privileged: false
monitor_docker:
  # This remains unchanged and worked before
  - name: Docker - Media
    scan_interval: 120
    # containers_exclude:
    switchenabled: false
    buttonenabled: false
    monitored_conditions:
      # - allinone
      # - version
      - containers_running
      - containers_total
      ...

  # This one works great now!
  - name: Docker - Raspi
    url: http://10.75.50.10:2375
    scan_interval: 120
    ...

pww217 avatar Jan 12 '25 02:01 pww217

I should mention the entities do not appear at ALL in the integration. It's not a name change or anything, they are simply not there while the Raspi ones are! Strange.

pww217 avatar Jan 12 '25 02:01 pww217

And naturally returning the stable fixed the first one.

pww217 avatar Jan 12 '25 02:01 pww217

@pww217 thanks for testing, let me try to reproduce the scenario with the local socket (I only was heavily testing via TCP connections, sorry)

ualex73 avatar Jan 12 '25 09:01 ualex73

@pww217 can you try 1.20b1? It should be fixed now (was a mistake my from end, sorry)

ualex73 avatar Jan 12 '25 09:01 ualex73

I am still testing with TLS, but it is possible not fully working yet in 1.20b1.

ualex73 avatar Jan 12 '25 10:01 ualex73

Seems to work on 1.20b1! Although I don't use TLS so I can't speak to that, but I rebooted both the host machine (with HA on it) and the remote one and it seems to function fine. Thanks for the fix!!

pww217 avatar Jan 12 '25 15:01 pww217

I cannot upgrade to 1.20b1 from HACS. b0 was available but not this one. :-)

Londoneye02 avatar Jan 13 '25 07:01 Londoneye02

Can you try again? HACS does some GitHub scans in the background, so it is not real time.

ualex73 avatar Jan 13 '25 17:01 ualex73

I found it!! I needed to make it as "installing a different version" from HACS. But works fine and Seems I can monitor all my containers, both local and remote. Thank you!!!

Londoneye02 avatar Jan 13 '25 21:01 Londoneye02

It should be fixed, so I will close ticket in 7 days if nobody objects :-)

ualex73 avatar Jan 16 '25 17:01 ualex73

I will close this one. I it reoccurs, please reopen.

ualex73 avatar Jan 26 '25 10:01 ualex73