docker-autoheal
docker-autoheal copied to clipboard
autoheal constantly restarts on linux/arm/v7
Tried deploying a container with this image on Armbian (an OrangePi PC board), but it constantly restarts.
First, tried running it with the following DockerCompose (I previously deployed the exact same Compose on a linux/amd64 host, and there works fine):
version: '3'
services:
autoheal:
# https://github.com/willfarrell/docker-autoheal
container_name: autoheah
image: willfarrell/autoheal:latest
network_mode: none
environment:
- AUTOHEAL_CONTAINER_LABEL=all
- AUTOHEAL_INTERVAL=10
- AUTOHEAL_START_PERIOD=60
- AUTOHEAL_DEFAULT_STOP_TIMEOUT=25
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /etc/localtime:/etc/localtime:ro
- /etc/timezone:/etc/timezone:ro
restart: always
The container is constantly restarting, producing the following output (restarts about once each ~+60 seconds because I set the AUTOHEAL_START_PERIOD to 60):
2021-08-28T11:13:04.616302674Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:14:07.059787538Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:15:09.648184993Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:16:11.946453731Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:17:14.241354914Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:18:16.564788868Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:19:18.855314561Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:20:21.163773499Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:21:23.453673455Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:22:25.853702825Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:23:28.171957971Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:24:30.817528380Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:25:33.123744990Z Monitoring containers for unhealthy status in 60 second(s)
2021-08-28T11:26:35.328544128Z Monitoring containers for unhealthy status in 60 second(s)
Running the proposed docker run command:
docker run -d \
--name autoheal \
--restart=always \
-e AUTOHEAL_CONTAINER_LABEL=all \
-v /var/run/docker.sock:/var/run/docker.sock \
willfarrell/autoheal
makes the container to be constantly restarting, and no output is given. If I set the restart policy to none, the container exists with code 28 in both cases.
Host system info:
OS: Debian 10 buster
Kernel: armv7l Linux 5.10.43-sunxi
Uptime: 4d 21h 58m
Packages: 479
Shell: 17285
CPU: ARMv7 rev 5 (v7l) @ 4x 1.368GHz [53.0°C]
GPU:
RAM: 362MiB / 999MiB
I have the same issue with the later versions of this image. It seems to be working well from an older version (although I am not sure what has changed since.) willfarrell/autoheal@sha256:0ad8b27083d065b8c22ea4db6b245097b8e3d3e44090196b11559de88801020c
is a digest that currently works on linux/arm/v7 (specifically an Rpi4). According to the ReadMe, latest
is built daily, but I have not yet looked into what may have changed that would cause this to break.
TL;DR: breaking change in Alpine base images ver. >= 3.13.0 (description about the issue and solutions: https://wiki.alpinelinux.org/wiki/Release_Notes_for_Alpine_3.13.0#time64_requirements). Can be fixed with this or by upgrading containerd.io or other packages (as stated in the alpine docs), among other things (see official documentation link).
@itrogers Now that you mention that a previous build worked for you, I think I found the problem, which I had some weeks ago with python:alpine images (https://github.com/docker-library/python/issues/637). It seems that since Alpine 3.13.0, some breaking change involving 32bit platforms was introduced - I'm not really into the details, but it's well explained here: https://wiki.alpinelinux.org/wiki/Release_Notes_for_Alpine_3.13.0#time64_requirements
I tried the same fix (described in https://github.com/docker-library/python/issues/637#issuecomment-904544160) and it worked!
I created a simple compose that deploys a failing container for testing the functionality:
version: "3"
services:
failing-healthcheck:
container_name: failing-healthcheck-test
image: debian:11
command: sleep infinity
network_mode: none
healthcheck:
test: ["CMD", "ls", "foo"]
interval: 5s
timeout: 1s
retries: 3
start_period: 0s
The autoheal container was working properly:
01-10-2021 15:19:38 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:20:11 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:20:43 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:21:17 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:21:49 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:22:22 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:22:55 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
01-10-2021 15:23:28 Container /failing-healthcheck-test (87f1e0d23036) found to be unhealthy - Restarting container now with 10s timeout
I guess another way for making it work on armv7 would be using an Alpine base image version before 3.13. This may be automated using build ARGs, and parametrizing the architectures in the build workflow. What do you think @willfarrell ? (I could try working on a PR if you agree). But:
That being said, the Alpine documentation also states that upgrading containerd.io or Docker+libseccomp packages to certain versions also "fixes" the issue, but maybe not all platforms can upgrade to those versions? In my case I just checked now the updates available on my board and I could upgrade containerd.io to version 1.4.10-1, which also fixes the issue.
seems to be broken for RPi4 using latest image. @itrogers suggested image works.
still broken on RPi4 :(
You can use this instead https://github.com/qdm12/deunhealth. It does the same thing and works on a RPI4.
seems to be broken for RPi4 using latest image. @itrogers suggested image works.
+1
@willfarrell Is this image still maintained?