[BUG] Health Check Stuck on Starting/Unhealthy on Docker
Environment
Self-Hosted (Docker)
System
Docker 20.x.x
Version
2.1.1
Describe the problem
Have Dashy running on system with the environment variable PORT set to 4000 as the container is running on Docker with the host network because the status checks where not working to some other containerised hosts when running as bridged.
The following shows the printenv and the output from a manually run health check, I am not sure what port it is attempting to connect to but I think it is trying to hit port 80? But that is currently in use by another service on the physical host, hence the PORT environment variable being used as Dashy would not start without the port change with the PORT envar.
/app # printenv
NODE_VERSION=16.13.2
HOSTNAME=0762cc75728d
YARN_VERSION=1.22.15
SHLVL=1
PORT=4000
HOME=/root
HOST_OS=Ubuntu
TERM=xterm
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
GID=100
UID=99
DIRECTORY=/app
IS_DOCKER=true
HOST_HOSTNAME=rXXXXXXXXXX1
HOST_CONTAINERNAME=dashy
PWD=/app
TZ=Europe/London
/app # yarn health-check
yarn run v1.22.15
$ node services/healthcheck
[Thu Aug 11 2022 22:00:00 GMT+0100 (British Summer Time)] Running health check...
Healthceck Failed, Error: ECONNREFUSED
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Additional info
No response
Please tick the boxes
- [X] You have explained the issue clearly, and included all relevant info
- [X] You are using a supported version of Dashy
- [X] You've checked that this issue hasn't already been raised
- [X] You've checked the docs and troubleshooting guide
- [X] You agree to the code of conduct
If you're enjoying Dashy, consider dropping us a ⭐
🤖 I'm a bot, and this message was automated
Please note that the start can fail if you did not allocate enough (a lot of ~) memory (>1GB). Eitherway, the start will be very slow and the issue seems to be under heavy re-work, see #799
This issue has gone 6 weeks without an update. To keep the ticket open, please indicate that it is still relevant in a comment below. Otherwise it will be closed in 5 working days.
This issue was automatically closed because it has been stalled for over 6 weeks with no activity.
Any progress on this issue? I am having the same problem: I leave the dashy container running for over 5 mins but still docker-compose ps shows dashy as health:starting. This is a big issue because the traefik router is not created, so i cannot access it (other than going to the exposed port directly)
Docker-compose (dashy service):
dashy:
image: "lissy93/dashy:latest"
container_name: Dashy
volumes:
- "./dashy/config.yml:/app/public/conf.yml"
ports:
- "8000:80"
Edit: never mind, looks like it needs 5+ mins before it says healthy. Why?
@pw-64,
Edit: never mind, looks like it needs 5+ mins before it says healthy. Why?
I was running into this same issue, dug into it, and found a solution. The problem is with the healthcheck definition in the Dockerfile:
HEALTHCHECK --interval=5m --timeout=5s --start-period=30s CMD yarn health-check
According to the Docker documentation:
The health check will first run interval seconds after the container is started
It will wait at least --start-period before performing the first check, but because --interval is larger than --start-period this means it won't perform the first check until the five minute mark. Until then, the container will be in health:starting state. Docker engine v25 added the --start-interval option to combat this, allowing a separate time for initial check compared to the recurring checks.
Sadly, I'm in an environment that is stuck on an older version of Docker, so using --start-interval wasn't an option for me. My solution was to add a custom healthcheck to my docker-compose:
dashy:
image: lissy93/dashy:latest
container_name: dashy
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "yarn health-check"]
start_period: 0s #30s
interval: 15s #5m
retries: 3
timeout: 5s
volumes:
- /volume1/docker/dashy/conf.yml:/app/public/conf.yml
networks:
- services
labels:
traefik.enable: true
traefik.http.routers.dashy.entrypoints: web, websecure
traefik.http.routers.dashy.rule: Host(`dashy.home.local`)
traefik.http.services.dashy.loadbalancer.server.port: 80
This means the health-check runs every 15 seconds instead of every 5 minutes, but it also means Traefik picks up the container after only 15 seconds when I need to restart it.
Ahh, interesting. Thank you for the update.
@pw-64,
Edit: never mind, looks like it needs 5+ mins before it says healthy. Why?
I was running into this same issue, dug into it, and found a solution. The problem is with the healthcheck definition in the Dockerfile:
HEALTHCHECK --interval=5m --timeout=5s --start-period=30s CMD yarn health-checkAccording to the Docker documentation:
The health check will first run interval seconds after the container is started
It will wait at least
--start-periodbefore performing the first check, but because--intervalis larger than--start-periodthis means it won't perform the first check until the five minute mark. Until then, the container will be inhealth:startingstate. Docker engine v25 added the--start-intervaloption to combat this, allowing a separate time for initial check compared to the recurring checks.Sadly, I'm in an environment that is stuck on an older version of Docker, so using
--start-intervalwasn't an option for me. My solution was to add a custom healthcheck to my docker-compose:dashy: image: lissy93/dashy:latest container_name: dashy restart: unless-stopped healthcheck: test: ["CMD-SHELL", "yarn health-check"] start_period: 0s #30s interval: 15s #5m retries: 3 timeout: 5s volumes: - /volume1/docker/dashy/conf.yml:/app/public/conf.yml networks: - services labels: traefik.enable: true traefik.http.routers.dashy.entrypoints: web, websecure traefik.http.routers.dashy.rule: Host(`dashy.home.local`) traefik.http.services.dashy.loadbalancer.server.port: 80This means the health-check runs every 15 seconds instead of every 5 minutes, but it also means Traefik picks up the container after only 15 seconds when I need to restart it.
Awesome man, it solves the problem for me, thanks.
This should be all good now, after the latest update and with specifying your own interval of choice. Let me know if still not working how you'd expect, and I'll reopen