Show that Healthtests are failing in the UI
To Reproduce
- Create a dokploy.
- Create a compose deployment like the following:
Compose:
services:
keycloak-d8fbf0c3:
container_name: keycloak-app-d8fbf0c3
image: quay.io/keycloak/keycloak:latest
restart: always
environment:
KEYCLOAK_ADMIN: ${KEYCLOAK_USER}
KEYCLOAK_ADMIN_PASSWORD: ${KEYCLOAK_PASSWORD}
KC_DB: postgres
KC_DB_URL: jdbc:postgresql://keycloak-db-d8fbf0c3:5432/keycloak
KC_DB_USERNAME: ${POSTGRES_USER}
KC_DB_PASSWORD: ${POSTGRES_PASSWORD}
KC_HTTP_ENABLED: 'true'
KC_PROXY_HEADERS: xforwarded
KC_HOSTNAME: ${KEYCLOAK_HOST}
depends_on:
- keycloak-db-d8fbf0c3
networks:
- keycloak-net-d8fbf0c3
command:
- start
keycloak-db-d8fbf0c3:
container_name: keycloak-db-d8fbf0c3
image: postgres:latest
restart: always
shm_size: 128mb
environment:
POSTGRES_DB: keycloak
POSTGRES_PORT: 5432
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
healthcheck:
test: pg_isready -U $$POSTGRES_USER -d keycloak
interval: 30s
timeout: 60s
retries: 5
start_period: 80s
volumes:
- ../files/keycloak-db-d8fbf0c3/data:/var/lib/postgresql/data
networks:
- keycloak-net-d8fbf0c3
networks:
keycloak-net-d8fbf0c3:
name: keycloak-net-d8fbf0c3
driver: bridge
ENV Variables:
KEYCLOAK_USER=admin
KEYCLOAK_PASSWORD=helloworld
KEYCLOAK_HOST=keycloak.dev.pfadi-meilen-herrliberg.ch
POSTGRES_USER=postgres
POSTGRES_PASSWORD=helloworld
Domain mapping: Domain: keycloak.dev.pfadi-meilen-herrliberg.ch Port: 8080 Path / Using Letsencrypt
Current vs. Expected behavior
When a docker compose has a Healthcheck that is failing, domains never get assigned. Still the container and runs with no sign of anything has gone wrong.
I would expect the UI to show something is wrong.
Provide environment information
Operating System:
OS: Ubuntu 20.04
Arch: amd64
Dokploy version: 0.20.8
VPS Provider: Azure Virtual Machine
What applications/services are you tying to deploy?
Keycloak
Which area(s) are affected? (Select all that apply)
Docker, Traefik, Docker Compose
Are you deploying the applications where Dokploy is installed or on a remote server?
Same server where Dokploy is installed
Additional context
No response
Will you send a PR to fix it?
No
I'm not sure about this issue, if the application internally has a problem, this is something unrelated to anything dokploy does, however, if the application really has a problem, in any case you should check the logs, if there is something wrong or something is failing, the problem should be there, docker logs seem to be fine so I don't really know what the problem is.
I think having some indicator is still a good idea - sometimes during a machine reboot, a handful of services/applications has a "green" indicator in Dokploy because it was previously deployed, but they haven't actually restarted (or at least not successfully).
This is more clearly flagged in Coolify, so I don't need to wait until users report that there's an issue to go and investigate it.
It might sound like a good idea, but honestly I don't think it's good in terms of performance especially for applications hosted on remote servers, because ssh connections are expensive and I wouldn't want people to have an additional problem, I think this system we are running works fine for the moment.
Hmm good point on remote servers. If this feature does get implemented, it can be made opt-in? (disabled by default).
I wonder how other services deal with the SSH performance issue you mentioned. Perhaps some public API can be exposed per server, providing info on the healthcheck status of each server's resources - that way, HTTP can be utilised instead of SSH?
Yes, but that would require an agent or something like that to be on that server (alot of work lol), it was an idea I had initially when developing remote servers, but I'd rather have access to the machine than have a single container running commands.
The agent is something similar to what we have in remote server monitoring https://docs.dokploy.com/docs/core/monitoring
Yes, but that would require an agent or something like that to be on that server (alot of work lol), it was an idea I had initially when developing remote servers, but I'd rather have access to the machine than have a single container running commands.
The agent is something similar to what we have in remote server monitoring https://docs.dokploy.com/docs/core/monitoring
Traefik supports direct active health checks using the --providers.docker.healthcheck option. This means that Traefik will only route requests to a container that reports as healthy. Not sure if that is an option that would provide some value.
This is something I was just thinking about while troubleshooting a bunch of different docker compose projects with multiple containers each. In production, I use an external monitoring service that can ping at least the Internet exposed stuff. But before that's set up and for local only services, it would be very handy to have a nice green light not only on the service, but on each project on the projects page (when heath checks pass for everything in the project). IMO the green light should really be health check instead of deployment based on the services too (but you could still go to the deployments tab to see that status).
I'm a novice when it comes to both Traefik and Docker Swarm and don't know either of their more advanced apis, But if there isn't a simple enough for now way to handle it across all the ways Dokploy manages things (local, remote servers over SSH or Swarm... applications, databases, compose, etc.), one idea to simplify it might be to just provide an optional field in the UI for a user provided heath check endpoint. This would ping any internal or external service the user wanted to setup. Maybe not the ideal way to solve it, but seeing a bunch of green lights telling you that all is good (or not) right when logging into Dokploy without digging into logs, etc. would be a great feature. Could also be a new notification action, if a health check fails after X retries.
I think that could be a good idea to add a field to specify an endpoint either internal or external, but I think we would be back to the same problem I mentioned, it would not be very effective for remote servers, if it is an internal service we would have to go to the server via ssh and verify which is what I want to avoid.
This isn't really my area of expertise, but in the case that there isn't a more automated solution and you go with the endpoint idea, could it be up to the user to figure out how complete the implementation? For example, if the health check is for a remote (ssh managed) server, they could add an internet accessible rest endpoint to their application, or allow the ping request to make a HEAD request at the home page/login if there's one. Then Dokploy pings that using node fetch, axios, or whatever every X seconds. I don't know if that works in all cases or if it would have enough uptake to justify tying to the main UI indicators. Just think, at least in theory, it would be a big upgrade
@unleashit if it's a publically accessible service, you can use monitoring tools like uptime-kuma.
I think having a healthcheck agent/daemon on each server to expose the internal healthcheck endpoint of all services (including non-public facing ones) as Siumauricio mentioned would be the most versatile, although it appears to require quite a bit of work.
Yeah, that sounds much more optimal. My suggestion was just a next best since he seems reluctant, other priorities, etc. I actually already use a service like uptime kuma, but still personally think it would be valuable to have within Dokploy. Also, if the user wanted to take it that far, they could implement their own service checking daemon and ping that. Lastly, This only applies to remote ssh I think. If the service is part of the Swarm or local, Dokploy could default to querying those for the checks instead. Just kind of a fallback idea.
I think healthcheck status is mostly important because it's one of the main reasons why domains do not work – traefik ignores unhealthy containers, which is not the first thing that beginners check. From the "ease-of-use" perspective having a badge that there was some issue with Traefik exposing the service is a huuge quality-of-life improvement. I personally have stumbled upon this issue several times and I'd be grateful for a sign that will let me not spend time on this and jump straight to the issue.