dashy icon indicating copy to clipboard operation
dashy copied to clipboard

[BUG] Health Check Stuck on Starting/Unhealthy on Docker

Open mort666 opened this issue 3 years ago • 3 comments

Environment

Self-Hosted (Docker)

System

Docker 20.x.x

Version

2.1.1

Describe the problem

Have Dashy running on system with the environment variable PORT set to 4000 as the container is running on Docker with the host network because the status checks where not working to some other containerised hosts when running as bridged.

The following shows the printenv and the output from a manually run health check, I am not sure what port it is attempting to connect to but I think it is trying to hit port 80? But that is currently in use by another service on the physical host, hence the PORT environment variable being used as Dashy would not start without the port change with the PORT envar.

/app # printenv
NODE_VERSION=16.13.2
HOSTNAME=0762cc75728d
YARN_VERSION=1.22.15
SHLVL=1
PORT=4000
HOME=/root
HOST_OS=Ubuntu
TERM=xterm
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
GID=100
UID=99
DIRECTORY=/app
IS_DOCKER=true
HOST_HOSTNAME=rXXXXXXXXXX1
HOST_CONTAINERNAME=dashy
PWD=/app
TZ=Europe/London
/app # yarn health-check
yarn run v1.22.15
$ node services/healthcheck
[Thu Aug 11 2022 22:00:00 GMT+0100 (British Summer Time)] Running health check...
Healthceck Failed, Error: ECONNREFUSED
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Additional info

No response

Please tick the boxes

mort666 avatar Aug 11 '22 21:08 mort666

If you're enjoying Dashy, consider dropping us a ⭐
🤖 I'm a bot, and this message was automated

liss-bot avatar Aug 11 '22 21:08 liss-bot

Please note that the start can fail if you did not allocate enough (a lot of ~) memory (>1GB). Eitherway, the start will be very slow and the issue seems to be under heavy re-work, see #799

chevdor avatar Aug 12 '22 07:08 chevdor

This issue has gone 6 weeks without an update. To keep the ticket open, please indicate that it is still relevant in a comment below. Otherwise it will be closed in 5 working days.

liss-bot avatar Sep 12 '22 01:09 liss-bot

This issue was automatically closed because it has been stalled for over 6 weeks with no activity.

liss-bot avatar Sep 18 '22 01:09 liss-bot

Any progress on this issue? I am having the same problem: I leave the dashy container running for over 5 mins but still docker-compose ps shows dashy as health:starting. This is a big issue because the traefik router is not created, so i cannot access it (other than going to the exposed port directly)

Docker-compose (dashy service):

dashy:
    image: "lissy93/dashy:latest"
    container_name: Dashy
    volumes:
      - "./dashy/config.yml:/app/public/conf.yml"
    ports:
      - "8000:80"

Edit: never mind, looks like it needs 5+ mins before it says healthy. Why?

pw-64 avatar Jun 26 '23 13:06 pw-64

@pw-64,

Edit: never mind, looks like it needs 5+ mins before it says healthy. Why?

I was running into this same issue, dug into it, and found a solution. The problem is with the healthcheck definition in the Dockerfile:

HEALTHCHECK --interval=5m --timeout=5s --start-period=30s CMD yarn health-check

According to the Docker documentation:

The health check will first run interval seconds after the container is started

It will wait at least --start-period before performing the first check, but because --interval is larger than --start-period this means it won't perform the first check until the five minute mark. Until then, the container will be in health:starting state. Docker engine v25 added the --start-interval option to combat this, allowing a separate time for initial check compared to the recurring checks.

Sadly, I'm in an environment that is stuck on an older version of Docker, so using --start-interval wasn't an option for me. My solution was to add a custom healthcheck to my docker-compose:

dashy:
    image: lissy93/dashy:latest
    container_name: dashy
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "yarn health-check"]
      start_period: 0s #30s
      interval: 15s #5m
      retries: 3
      timeout: 5s
    volumes:
      - /volume1/docker/dashy/conf.yml:/app/public/conf.yml
    networks:
      - services
    labels:
      traefik.enable: true
      traefik.http.routers.dashy.entrypoints: web, websecure
      traefik.http.routers.dashy.rule: Host(`dashy.home.local`)
      traefik.http.services.dashy.loadbalancer.server.port: 80

This means the health-check runs every 15 seconds instead of every 5 minutes, but it also means Traefik picks up the container after only 15 seconds when I need to restart it.

Drakmyth avatar Feb 21 '24 14:02 Drakmyth

Ahh, interesting. Thank you for the update.

pw-64 avatar Feb 21 '24 15:02 pw-64

@pw-64,

Edit: never mind, looks like it needs 5+ mins before it says healthy. Why?

I was running into this same issue, dug into it, and found a solution. The problem is with the healthcheck definition in the Dockerfile:

HEALTHCHECK --interval=5m --timeout=5s --start-period=30s CMD yarn health-check

According to the Docker documentation:

The health check will first run interval seconds after the container is started

It will wait at least --start-period before performing the first check, but because --interval is larger than --start-period this means it won't perform the first check until the five minute mark. Until then, the container will be in health:starting state. Docker engine v25 added the --start-interval option to combat this, allowing a separate time for initial check compared to the recurring checks.

Sadly, I'm in an environment that is stuck on an older version of Docker, so using --start-interval wasn't an option for me. My solution was to add a custom healthcheck to my docker-compose:

dashy:
    image: lissy93/dashy:latest
    container_name: dashy
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "yarn health-check"]
      start_period: 0s #30s
      interval: 15s #5m
      retries: 3
      timeout: 5s
    volumes:
      - /volume1/docker/dashy/conf.yml:/app/public/conf.yml
    networks:
      - services
    labels:
      traefik.enable: true
      traefik.http.routers.dashy.entrypoints: web, websecure
      traefik.http.routers.dashy.rule: Host(`dashy.home.local`)
      traefik.http.services.dashy.loadbalancer.server.port: 80

This means the health-check runs every 15 seconds instead of every 5 minutes, but it also means Traefik picks up the container after only 15 seconds when I need to restart it.

Awesome man, it solves the problem for me, thanks.

alexkander avatar Mar 18 '24 03:03 alexkander

This should be all good now, after the latest update and with specifying your own interval of choice. Let me know if still not working how you'd expect, and I'll reopen

Lissy93 avatar Apr 21 '24 23:04 Lissy93