docker-compose status returns unhealthy while endpoint returns healthy
docker-compose ps returns:
Name Command State Ports
-------------------------------------------------------------------------------------------------------------------------------------------
apps_shynet-db_1 docker-entrypoint.sh postgres Up 5432/tcp
apps_shynet_1 ./entrypoint.sh Up (unhealthy) 0.0.0.0:49654->8080/tcp,:::49654->8080/tcp
curl 172.25.0.3:8080/healthz/?format=json (Where the IP points to the container) returns:
HTTP/1.1 200 OK
Server: gunicorn
Date: Thu, 13 Jan 2022 06:54:20 GMT
Connection: close
Content-Type: application/json
Expires: Thu, 13 Jan 2022 06:54:20 GMT
Cache-Control: max-age=0, no-cache, no-store, must-revalidate, private
X-Frame-Options: DENY
Content-Length: 67
X-Content-Type-Options: nosniff
Referrer-Policy: same-origin
{"Cache backend: default": "working", "DatabaseBackend": "working"}
Looks like unintended behaviour.
This prevents strict ingress controllers like Traefik from functioning because it would filter out anything that fails the health-check.
More context:
startup logs (no warnings):
shynet_1 | Launching Shynet web server...
shynet_1 | [2021-12-30 13:55:35 +0000] [1] [INFO] Starting gunicorn 20.1.0
shynet_1 | [2021-12-30 13:55:35 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
shynet_1 | [2021-12-30 13:55:35 +0000] [1] [INFO] Using worker: sync
shynet_1 | [2021-12-30 13:55:35 +0000] [9] [INFO] Booting worker with pid: 9
traefik | time="2021-12-30T13:55:38Z" level=debug msg="Filtering unhealthy or starting container" providerName=docker container=shynet-apps-d407b653cda6c44be6193efe8da6c8fd44a6e8c798957e44d2a20ea3068f75dc
compose file
version: '3'
services:
traefik:
restart: always
image: traefik:2.5
container_name: traefik
command:
- --entrypoints.web.address=:80
- --providers.docker=true
- --providers.docker.exposedbydefault=false
- --log=true
- --log.level=DEBUG
ports:
- 80:80
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
shynet:
image: shynet:docker-fix
restart: unless-stopped
ports:
- '8080'
environment:
- DB_HOST=shynet-db
- DB_NAME=shynet
- DB_USER=shynet
- DB_PASSWORD=shynet
- DJANGO_SECRET_KEY=shynet
- TIME_ZONE=Africa/Nairobi
labels:
- traefik.enable=true
- traefik.port=8080
# dns entry for api.local -> 127.0.0.1
- traefik.http.routers.analytics.rule=Host(`api.local`)
depends_on:
- shynet-db
- traefik
shynet-db:
image: postgres:13-alpine
restart: always
environment:
- POSTGRES_DB=shynet
- POSTGRES_USER=shynet
- POSTGRES_PASSWORD=shynet
volumes:
- ./shynet_db:/var/lib/postgresql/data
As a temporary fix, I am completely removing the HEALTHCHECK from the Dockerfile and rebuilding the image, that works in my case when using Traefik.
What is the HTTP status code of curl 172.25.0.3:8080/healthz/?format=json? What criteria does Traefik use to determine whether a health check is healthy or not?
I think I was seeing the same issue... this happens when ALLOWED_HOSTS is either unset or still set to the default *:
Performing startup checks...
Database is ready to go.
Startup checks complete!
Launching Shynet web server...
[2022-01-01 13:21:50 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2022-01-01 13:21:50 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-01-01 13:21:50 +0000] [1] [INFO] Using worker: sync
[2022-01-01 13:21:50 +0000] [10] [INFO] Booting worker with pid: 10
ERROR Invalid HTTP_HOST header: '*'. The domain name provided is not valid according to RFC 1034/1035.
So I'm assuming the new health check is not yet working for that specific case of ALLOWED_HOSTS being set to *. After setting ALLOWED_HOSTS to a set of host names the health check was working and my container was starting as expected again.
Ugh, when designing healthcheck command I didn't consider allowed_hosts == "*" I'll do pull request in a moment. Thanks for the issue.
Ugh, when designing healthcheck command I didn't consider allowed_hosts == "*" I'll do pull request in a moment. Thanks for the issue.
Thanks for making this fix!
@fnwbr could you please check if this PR fixes your problem?
Hey, thanks for the quick fix! I'm unable to test this out at the moment, sorry. 😕 Maybe @kamikazechaser can?
I'll try and test out the fix within the week and get back.
I tested the fix in #186 and it did not work. I have updated my original comment with the full docker-compose.yml that I am using with traefik.
Hmmm...
@kamikazechaser so you are not setting ALLOWED_HOSTS at all?