dokploy icon indicating copy to clipboard operation
dokploy copied to clipboard

Show that Healthtests are failing in the UI

Open TeamBattino opened this issue 9 months ago • 12 comments

To Reproduce

  1. Create a dokploy.
  2. Create a compose deployment like the following:

Compose:

services:
  keycloak-d8fbf0c3:
    container_name: keycloak-app-d8fbf0c3
    image: quay.io/keycloak/keycloak:latest
    restart: always
    environment:
      KEYCLOAK_ADMIN: ${KEYCLOAK_USER}
      KEYCLOAK_ADMIN_PASSWORD: ${KEYCLOAK_PASSWORD}
      KC_DB: postgres
      KC_DB_URL: jdbc:postgresql://keycloak-db-d8fbf0c3:5432/keycloak
      KC_DB_USERNAME: ${POSTGRES_USER}
      KC_DB_PASSWORD: ${POSTGRES_PASSWORD}
      KC_HTTP_ENABLED: 'true'
      KC_PROXY_HEADERS: xforwarded
      KC_HOSTNAME: ${KEYCLOAK_HOST}
    depends_on:
      - keycloak-db-d8fbf0c3
    networks:
      - keycloak-net-d8fbf0c3
    command:
      - start
  keycloak-db-d8fbf0c3:
    container_name: keycloak-db-d8fbf0c3
    image: postgres:latest
    restart: always
    shm_size: 128mb
    environment:
      POSTGRES_DB: keycloak
      POSTGRES_PORT: 5432
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    healthcheck:
      test: pg_isready -U $$POSTGRES_USER -d keycloak
      interval: 30s
      timeout: 60s
      retries: 5
      start_period: 80s
    volumes:
      - ../files/keycloak-db-d8fbf0c3/data:/var/lib/postgresql/data
    networks:
      - keycloak-net-d8fbf0c3
networks:
  keycloak-net-d8fbf0c3:
    name: keycloak-net-d8fbf0c3
    driver: bridge

ENV Variables:

KEYCLOAK_USER=admin
KEYCLOAK_PASSWORD=helloworld
KEYCLOAK_HOST=keycloak.dev.pfadi-meilen-herrliberg.ch
POSTGRES_USER=postgres
POSTGRES_PASSWORD=helloworld

Domain mapping: Domain: keycloak.dev.pfadi-meilen-herrliberg.ch Port: 8080 Path / Using Letsencrypt

Current vs. Expected behavior

When a docker compose has a Healthcheck that is failing, domains never get assigned. Still the container and runs with no sign of anything has gone wrong.

I would expect the UI to show something is wrong.

Provide environment information

Operating System:
  OS: Ubuntu 20.04
  Arch: amd64
Dokploy version: 0.20.8
VPS Provider: Azure Virtual Machine
What applications/services are you tying to deploy?
  Keycloak

Which area(s) are affected? (Select all that apply)

Docker, Traefik, Docker Compose

Are you deploying the applications where Dokploy is installed or on a remote server?

Same server where Dokploy is installed

Additional context

No response

Will you send a PR to fix it?

No

TeamBattino avatar Mar 24 '25 19:03 TeamBattino

I'm not sure about this issue, if the application internally has a problem, this is something unrelated to anything dokploy does, however, if the application really has a problem, in any case you should check the logs, if there is something wrong or something is failing, the problem should be there, docker logs seem to be fine so I don't really know what the problem is.

Image

Image

Siumauricio avatar Mar 27 '25 08:03 Siumauricio

I think having some indicator is still a good idea - sometimes during a machine reboot, a handful of services/applications has a "green" indicator in Dokploy because it was previously deployed, but they haven't actually restarted (or at least not successfully).

This is more clearly flagged in Coolify, so I don't need to wait until users report that there's an issue to go and investigate it.

nktnet1 avatar Mar 27 '25 21:03 nktnet1

It might sound like a good idea, but honestly I don't think it's good in terms of performance especially for applications hosted on remote servers, because ssh connections are expensive and I wouldn't want people to have an additional problem, I think this system we are running works fine for the moment.

Siumauricio avatar Mar 28 '25 05:03 Siumauricio

Hmm good point on remote servers. If this feature does get implemented, it can be made opt-in? (disabled by default).

I wonder how other services deal with the SSH performance issue you mentioned. Perhaps some public API can be exposed per server, providing info on the healthcheck status of each server's resources - that way, HTTP can be utilised instead of SSH?

nktnet1 avatar Mar 31 '25 00:03 nktnet1

Yes, but that would require an agent or something like that to be on that server (alot of work lol), it was an idea I had initially when developing remote servers, but I'd rather have access to the machine than have a single container running commands.

The agent is something similar to what we have in remote server monitoring https://docs.dokploy.com/docs/core/monitoring

Siumauricio avatar Mar 31 '25 04:03 Siumauricio

Yes, but that would require an agent or something like that to be on that server (alot of work lol), it was an idea I had initially when developing remote servers, but I'd rather have access to the machine than have a single container running commands.

The agent is something similar to what we have in remote server monitoring https://docs.dokploy.com/docs/core/monitoring

Traefik supports direct active health checks using the --providers.docker.healthcheck option. This means that Traefik will only route requests to a container that reports as healthy. Not sure if that is an option that would provide some value.

jrparks avatar Mar 31 '25 06:03 jrparks

This is something I was just thinking about while troubleshooting a bunch of different docker compose projects with multiple containers each. In production, I use an external monitoring service that can ping at least the Internet exposed stuff. But before that's set up and for local only services, it would be very handy to have a nice green light not only on the service, but on each project on the projects page (when heath checks pass for everything in the project). IMO the green light should really be health check instead of deployment based on the services too (but you could still go to the deployments tab to see that status).

I'm a novice when it comes to both Traefik and Docker Swarm and don't know either of their more advanced apis, But if there isn't a simple enough for now way to handle it across all the ways Dokploy manages things (local, remote servers over SSH or Swarm... applications, databases, compose, etc.), one idea to simplify it might be to just provide an optional field in the UI for a user provided heath check endpoint. This would ping any internal or external service the user wanted to setup. Maybe not the ideal way to solve it, but seeing a bunch of green lights telling you that all is good (or not) right when logging into Dokploy without digging into logs, etc. would be a great feature. Could also be a new notification action, if a health check fails after X retries.

unleashit avatar Mar 31 '25 21:03 unleashit

I think that could be a good idea to add a field to specify an endpoint either internal or external, but I think we would be back to the same problem I mentioned, it would not be very effective for remote servers, if it is an internal service we would have to go to the server via ssh and verify which is what I want to avoid.

Siumauricio avatar Apr 04 '25 07:04 Siumauricio

This isn't really my area of expertise, but in the case that there isn't a more automated solution and you go with the endpoint idea, could it be up to the user to figure out how complete the implementation? For example, if the health check is for a remote (ssh managed) server, they could add an internet accessible rest endpoint to their application, or allow the ping request to make a HEAD request at the home page/login if there's one. Then Dokploy pings that using node fetch, axios, or whatever every X seconds. I don't know if that works in all cases or if it would have enough uptake to justify tying to the main UI indicators. Just think, at least in theory, it would be a big upgrade

unleashit avatar Apr 06 '25 21:04 unleashit

@unleashit if it's a publically accessible service, you can use monitoring tools like uptime-kuma.

I think having a healthcheck agent/daemon on each server to expose the internal healthcheck endpoint of all services (including non-public facing ones) as Siumauricio mentioned would be the most versatile, although it appears to require quite a bit of work.

nktnet1 avatar Apr 06 '25 21:04 nktnet1

Yeah, that sounds much more optimal. My suggestion was just a next best since he seems reluctant, other priorities, etc. I actually already use a service like uptime kuma, but still personally think it would be valuable to have within Dokploy. Also, if the user wanted to take it that far, they could implement their own service checking daemon and ping that. Lastly, This only applies to remote ssh I think. If the service is part of the Swarm or local, Dokploy could default to querying those for the checks instead. Just kind of a fallback idea.

unleashit avatar Apr 06 '25 22:04 unleashit

I think healthcheck status is mostly important because it's one of the main reasons why domains do not work – traefik ignores unhealthy containers, which is not the first thing that beginners check. From the "ease-of-use" perspective having a badge that there was some issue with Traefik exposing the service is a huuge quality-of-life improvement. I personally have stumbled upon this issue several times and I'd be grateful for a sign that will let me not spend time on this and jump straight to the issue.

FallenChromium avatar May 06 '25 08:05 FallenChromium