docker-autoheal icon indicating copy to clipboard operation
docker-autoheal copied to clipboard

autoheal constantly restarts on linux/arm/v7

Open David-Lor opened this issue 2 years ago • 6 comments

Tried deploying a container with this image on Armbian (an OrangePi PC board), but it constantly restarts.

First, tried running it with the following DockerCompose (I previously deployed the exact same Compose on a linux/amd64 host, and there works fine):

version: '3'

services:
  autoheal:
    # https://github.com/willfarrell/docker-autoheal
    container_name: autoheah
    image: willfarrell/autoheal:latest
    network_mode: none
    environment:
      - AUTOHEAL_CONTAINER_LABEL=all
      - AUTOHEAL_INTERVAL=10
      - AUTOHEAL_START_PERIOD=60
      - AUTOHEAL_DEFAULT_STOP_TIMEOUT=25
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    restart: always

The container is constantly restarting, producing the following output (restarts about once each ~+60 seconds because I set the AUTOHEAL_START_PERIOD to 60):

2021-08-28T11:13:04.616302674Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:14:07.059787538Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:15:09.648184993Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:16:11.946453731Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:17:14.241354914Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:18:16.564788868Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:19:18.855314561Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:20:21.163773499Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:21:23.453673455Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:22:25.853702825Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:23:28.171957971Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:24:30.817528380Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:25:33.123744990Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:26:35.328544128Z Monitoring containers for unhealthy status in 60 second(s)

Running the proposed docker run command:

docker run -d \
    --name autoheal \
    --restart=always \
    -e AUTOHEAL_CONTAINER_LABEL=all \
    -v /var/run/docker.sock:/var/run/docker.sock \
    willfarrell/autoheal

makes the container to be constantly restarting, and no output is given. If I set the restart policy to none, the container exists with code 28 in both cases.

Host system info:

 OS: Debian 10 buster
 Kernel: armv7l Linux 5.10.43-sunxi
 Uptime: 4d 21h 58m
 Packages: 479
 Shell: 17285
 CPU: ARMv7 rev 5 (v7l) @ 4x 1.368GHz [53.0°C]
 GPU: 
 RAM: 362MiB / 999MiB

David-Lor avatar Aug 28 '21 11:08 David-Lor

I have the same issue with the later versions of this image. It seems to be working well from an older version (although I am not sure what has changed since.) willfarrell/autoheal@sha256:0ad8b27083d065b8c22ea4db6b245097b8e3d3e44090196b11559de88801020c is a digest that currently works on linux/arm/v7 (specifically an Rpi4). According to the ReadMe, latest is built daily, but I have not yet looked into what may have changed that would cause this to break.

itrogers avatar Sep 30 '21 21:09 itrogers

TL;DR: breaking change in Alpine base images ver. >= 3.13.0 (description about the issue and solutions: https://wiki.alpinelinux.org/wiki/Release_Notes_for_Alpine_3.13.0#time64_requirements). Can be fixed with this or by upgrading containerd.io or other packages (as stated in the alpine docs), among other things (see official documentation link).

@itrogers Now that you mention that a previous build worked for you, I think I found the problem, which I had some weeks ago with python:alpine images (https://github.com/docker-library/python/issues/637). It seems that since Alpine 3.13.0, some breaking change involving 32bit platforms was introduced - I'm not really into the details, but it's well explained here: https://wiki.alpinelinux.org/wiki/Release_Notes_for_Alpine_3.13.0#time64_requirements

I tried the same fix (described in https://github.com/docker-library/python/issues/637#issuecomment-904544160) and it worked!

I created a simple compose that deploys a failing container for testing the functionality:

version: "3"

services:
  failing-healthcheck:
    container_name: failing-healthcheck-test
    image: debian:11
    command: sleep infinity
    network_mode: none
    healthcheck:
      test: ["CMD", "ls", "foo"]
      interval: 5s
      timeout: 1s
      retries: 3
      start_period: 0s

The autoheal container was working properly:

01-10-2021 15:19:38 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:20:11 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:20:43 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:21:17 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:21:49 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:22:22 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:22:55 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:23:28 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout

I guess another way for making it work on armv7 would be using an Alpine base image version before 3.13. This may be automated using build ARGs, and parametrizing the architectures in the build workflow. What do you think @willfarrell ? (I could try working on a PR if you agree). But:

That being said, the Alpine documentation also states that upgrading containerd.io or Docker+libseccomp packages to certain versions also "fixes" the issue, but maybe not all platforms can upgrade to those versions? In my case I just checked now the updates available on my board and I could upgrade containerd.io to version 1.4.10-1, which also fixes the issue.

David-Lor avatar Oct 01 '21 15:10 David-Lor

seems to be broken for RPi4 using latest image. @itrogers suggested image works.

ishdemon avatar Nov 14 '21 13:11 ishdemon

still broken on RPi4 :(

zpatten avatar Mar 13 '22 15:03 zpatten

You can use this instead https://github.com/qdm12/deunhealth. It does the same thing and works on a RPI4.

ljford7 avatar May 01 '22 16:05 ljford7

seems to be broken for RPi4 using latest image. @itrogers suggested image works.

+1

@willfarrell Is this image still maintained?

Freekers avatar Aug 22 '22 07:08 Freekers