coolify icon indicating copy to clipboard operation
coolify copied to clipboard

[Bug]: After a month of running, all Apps respond with "Bad gateway"

Open henk23 opened this issue 1 year ago • 13 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Example public repository

coolify

Description

Running one NodeJS App and one Directus Service, after 4 weeks of good service, the server, including the Coolify Admin App, suddenly responds with "Bad Gateway".

Unfortunaltely there is no hint whatsoever in any of the docker logs. All services responded healthily until shortly before midnight, then the logs just stop.

install.sh --restart did not help. What helped was to restart all of the docker containers with docker restart $(docker ps -a -q)

Steps To Reproduce

  1. Start some services
  2. Wait 4 weeks

Version

3.12.30

henk23 avatar May 17 '23 07:05 henk23

Today I also got this but in my case I suspect the culprit is that there was az update to docker packages.

agdolla avatar May 17 '23 18:05 agdolla

I also got this after updating my Ubuntu server and Docker. After a server reboot, the Bad Gateway error goes away, and I can access Coolify dashboard, but all the Docker containers show Application Error, and when I try to visit the websites it gives me a 404 page not found. Redeploying all containers works, but if I restart the server, the same thing happens. All containers show Application Error.

These are the logs for a static site container (nginx:alpine):
2023-05-20T16:55:20.192512990Z /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration

2023-05-20T16:55:20.192559863Z /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/

2023-05-20T16:55:20.196721301Z /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh

2023-05-20T16:55:20.211969122Z 10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf

2023-05-20T16:55:20.246529355Z 10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf

2023-05-20T16:55:20.249246198Z /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh

2023-05-20T16:55:20.266276467Z /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh

2023-05-20T16:55:20.274232430Z /docker-entrypoint.sh: Configuration complete; ready for start up 
These are the logs for a PHP container (webdevops/php-nginx:7.4-alpine)
2023-05-20T17:01:05.597540118Z nginx: [warn] the "ssl" directive is deprecated, use the "listen ... ssl" directive instead in /opt/docker/etc/nginx/vhost.ssl.conf:1

2023-05-20T17:01:05.615388378Z [2023-05-20T17:01:05.614789] WARNING: With use-dns(no), dns-cache() will be forced to 'no' too!;

2023-05-20T17:01:05.748581062Z [20-May-2023 17:01:05] NOTICE: fpm is running, pid 47

2023-05-20T17:01:05.750284394Z [20-May-2023 17:01:05] NOTICE: ready to handle connections

2023-05-20T17:01:10.462612588Z [php-fpm:access] 127.0.0.1 - 20/May/2023:17:01:10 +0000 "GET /index.php" 200 /app/index.php 68.355 2048 87.78%

2023-05-20T17:08:14.264981153Z [php-fpm:access] 127.0.0.1 - 20/May/2023:17:08:14 +0000 "GET /index.php" 200 /app/index.php 27.720 2048 72.15%

2023-05-20T17:10:28.581821198Z [php-fpm:access] 127.0.0.1 - 20/May/2023:17:10:28 +0000 "GET /index.php" 200 /app/index.php 21.098 2048 94.80%

2023-05-20T17:10:28.984805474Z [php-fpm:access] 127.0.0.1 - 20/May/2023:17:10:28 +0000 "GET /index.php" 200 /app/index.php 16.444 2048 60.81%

2023-05-20T17:01:01.042195039Z -> Executing /opt/docker/provision/entrypoint.d/05-permissions.sh

2023-05-20T17:01:01.042337444Z -> Executing /opt/docker/provision/entrypoint.d/20-nginx.sh

2023-05-20T17:01:01.068801016Z -> Executing /opt/docker/provision/entrypoint.d/20-php-fpm.sh

2023-05-20T17:01:01.124697428Z -> Executing /opt/docker/provision/entrypoint.d/20-php.sh

2023-05-20T17:01:01.138280305Z -> Executing /opt/docker/provision/entrypoint.d/30-entrypoint.sh

2023-05-20T17:01:04.214634857Z -> Executing /opt/docker/bin/service.d/supervisor.d//10-init.sh

2023-05-20T17:01:04.552373094Z 2023-05-20 17:01:04,551 INFO Included extra file "/opt/docker/etc/supervisor.d/cron.conf" during parsing

2023-05-20T17:01:04.552437530Z 2023-05-20 17:01:04,551 INFO Included extra file "/opt/docker/etc/supervisor.d/dnsmasq.conf" during parsing

2023-05-20T17:01:04.552453244Z 2023-05-20 17:01:04,551 INFO Included extra file "/opt/docker/etc/supervisor.d/nginx.conf" during parsing

2023-05-20T17:01:04.552465895Z 2023-05-20 17:01:04,552 INFO Included extra file "/opt/docker/etc/supervisor.d/php-fpm.conf" during parsing

2023-05-20T17:01:04.552478463Z 2023-05-20 17:01:04,552 INFO Included extra file "/opt/docker/etc/supervisor.d/postfix.conf" during parsing

2023-05-20T17:01:04.552491000Z 2023-05-20 17:01:04,552 INFO Included extra file "/opt/docker/etc/supervisor.d/ssh.conf" during parsing

2023-05-20T17:01:04.552503263Z 2023-05-20 17:01:04,552 INFO Included extra file "/opt/docker/etc/supervisor.d/syslog.conf" during parsing

2023-05-20T17:01:04.552515448Z 2023-05-20 17:01:04,552 INFO Set uid to user 0 succeeded

2023-05-20T17:01:04.558555048Z 2023-05-20 17:01:04,558 INFO RPC interface 'supervisor' initialized

2023-05-20T17:01:04.558798208Z 2023-05-20 17:01:04,558 INFO supervisord started with pid 1

2023-05-20T17:01:05.562847449Z 2023-05-20 17:01:05,562 INFO spawned: 'syslogd' with pid 45

2023-05-20T17:01:05.567372134Z 2023-05-20 17:01:05,566 INFO spawned: 'nginxd' with pid 46

2023-05-20T17:01:05.586235129Z 2023-05-20 17:01:05,573 INFO spawned: 'php-fpmd' with pid 47

2023-05-20T17:01:05.592060994Z 2023-05-20 17:01:05,591 INFO spawned: 'crond' with pid 48

2023-05-20T17:01:05.593284039Z -> Executing /opt/docker/bin/service.d/syslog-ng.d//10-init.sh

2023-05-20T17:01:05.593428587Z -> Executing /opt/docker/bin/service.d/nginx.d//10-init.sh

2023-05-20T17:01:05.594176746Z -> Executing /opt/docker/bin/service.d/php-fpm.d//10-init.sh

2023-05-20T17:01:05.594190410Z Setting php-fpm user to application

2023-05-20T17:01:05.596502502Z 2023-05-20 17:01:05,596 INFO success: nginxd entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)

2023-05-20T17:01:05.596656424Z 2023-05-20 17:01:05,596 INFO success: php-fpmd entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)

2023-05-20T17:01:05.596765801Z 2023-05-20 17:01:05,596 INFO success: crond entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)

2023-05-20T17:01:05.605708933Z -> Executing /opt/docker/bin/service.d/cron.d//10-init.sh

2023-05-20T17:01:05.616391788Z [SYSLOG] syslog-ng[45]: syslog-ng starting up; version='3.36.1'

2023-05-20T17:01:06.752087376Z 2023-05-20 17:01:06,751 INFO success: syslogd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

The other ones show everything like they succeeded, but the status of the app is DEGRADED. The databases and services are all stopped. I'm assuming this is because of some breaking changes in the Docker upgrade.

I'm going to revert my server to the snapshot that I took before upgrading my server until this is fixed. And I hope this will be fixed soon, because the upgrade included a critical patch to the kernel for a zero-day exploit! 😕

danieldogeanu avatar May 20 '23 18:05 danieldogeanu

Can you please paste here the Docker Engine version (current & available upgrade) + /etc/docker/daemon.json?

andrasbacsai avatar May 24 '23 18:05 andrasbacsai

My Docker Engine & /etc/docker/daemon.json are as follows:

Upgraded Non-Working:

Coolify Versions: v3.12.31, v3.12.32 (both have the same behavior)

Docker Engine Version 24.0.1
Client: Docker Engine - Community
 Version:           24.0.1
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        6802122
 Built:             Fri May 19 18:06:21 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.1
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       463850e
  Built:            Fri May 19 18:06:21 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Contents of /etc/docker/daemon.json
{
    "log-driver": "json-file",
    "log-opts": {
      "max-size": "100m",
      "max-file": "5"
    },
    "features": {
        "buildkit": true
    },
    "live-restore": true,
    "default-address-pools" : [
    {
      "base" : "172.17.0.0/12",
      "size" : 20
    },
    {
      "base" : "192.168.0.0/16",
      "size" : 24
    }
  ]
}

Working Non-Upgraded:

Coolify Version: v3.12.31

Docker Engine Version 23.0.4
Client: Docker Engine - Community
 Version:           23.0.4
 API version:       1.42
 Go version:        go1.19.8
 Git commit:        f480fb1
 Built:             Fri Apr 14 10:32:03 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.4
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.8
  Git commit:       cbce331
  Built:            Fri Apr 14 10:32:03 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.20
  GitCommit:        2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc:
  Version:          1.1.5
  GitCommit:        v1.1.5-0-gf19387a
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Contents of /etc/docker/daemon.json
{
    "log-driver": "json-file",
    "log-opts": {
      "max-size": "100m",
      "max-file": "5"
    },
    "features": {
        "buildkit": true
    },
    "live-restore": true,
    "default-address-pools" : [
    {
      "base" : "172.17.0.0/12",
      "size" : 20
    },
    {
      "base" : "192.168.0.0/16",
      "size" : 24
    }
  ]
}

And in case it makes any difference, my Ubuntu version is: 22.04.2 LTS (GNU/Linux 5.15.0-72-generic x86_64)

Let me know if there's any other logs/files I can provide you with!

danieldogeanu avatar May 25 '23 12:05 danieldogeanu

i also have this issue after upgrading my rpi to latest apt-get

building my nextjs app keeps crashing coolify, and if i wait a few minutes it will start back up

build shows this then crashses:

CleanShot 2023-05-25 at 12 32 34

Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
Client: Docker Engine - Community
 Version:           24.0.1
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        6802122
 Built:             Fri May 19 18:05:49 2023
 OS/Arch:           linux/arm64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.1
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       463850e
  Built:            Fri May 19 18:05:49 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
pi@raspberrypi:~ $ cat /etc/docker/daemon.json
{
    "log-driver": "json-file",
    "log-opts": {
      "max-size": "100m",
      "max-file": "5"
    },
    "features": {
        "buildkit": true
    },
    "live-restore": true,
    "default-address-pools" : [
    {
      "base" : "172.17.0.0/12",
      "size" : 20
    },
    {
      "base" : "192.168.0.0/16",
      "size" : 24
    }
  ]
}

i downgraded to try to make this work

pi@raspberrypi:~ $ docker --version
Docker version 23.0.6, build ef23cbc
pi@raspberrypi:~ $ sudo sh get-docker.sh --version 23.0.6

same error on this version

CleanShot 2023-05-25 at 12 58 08

CleanShot 2023-05-25 at 12 58 35@2x

CleanShot 2023-05-25 at 12 59 55

trying 23.0.4 next sudo sh get-docker.sh --version 23.0.4 per @danieldogeanu stating its working

Geczy avatar May 25 '23 16:05 Geczy

@danieldogeanu did you check coolify v3.12.32 on docker 23.0.4 ?

Geczy avatar May 25 '23 17:05 Geczy

still the same issue on 3.12.31 and docker 23.0.4. just complete crash during build

CleanShot 2023-05-25 at 13 24 50

Geczy avatar May 25 '23 17:05 Geczy

{
  error: Error: Command failed with exit code 1: docker cp /tmp/cli3eta2k0000pg9s7q6xutq7-key.pem coolify-proxy:/etc/traefik/acme/custom/
  lstat /tmp/cli3eta2k0000pg9s7q6xutq7-key.pem: no such file or directory
      at makeError (file:///app/node_modules/.pnpm/[email protected]/node_modules/execa/lib/error.js:59:11)
      at handlePromise (file:///app/node_modules/.pnpm/[email protected]/node_modules/execa/index.js:119:26)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async executeCommand (/app/lib/common.js:701:12)
      at async copyLocalCertificates (/app/index.js:509:5)
      at async file:///app/node_modules/.pnpm/[email protected]/node_modules/p-map/index.js:141:20 {
    shortMessage: 'Command failed with exit code 1: docker cp /tmp/cli3eta2k0000pg9s7q6xutq7-key.pem coolify-proxy:/etc/traefik/acme/custom/',
    command: 'docker cp /tmp/cli3eta2k0000pg9s7q6xutq7-key.pem coolify-proxy:/etc/traefik/acme/custom/',
    escapedCommand: 'docker cp "/tmp/cli3eta2k0000pg9s7q6xutq7-key.pem" "coolify-proxy:/etc/traefik/acme/custom/"',
    exitCode: 1,
    signal: undefined,
    signalDescription: undefined,
    stdout: '',
    stderr: 'lstat /tmp/cli3eta2k0000pg9s7q6xutq7-key.pem: no such file or directory',
    failed: true,
    timedOut: false,
    isCanceled: false,
    killed: false
  }
}
Error: Command failed with exit code 1: find /tmp/ -maxdepth 1 -type f -name *-*.pem -delete
find: cannot delete '/tmp/cli3eta2k0000pg9s7q6xutq7-cert.pem': No such file or directory
find: cannot delete '/tmp/cli3eta2k0000pg9s7q6xutq7-key.pem': No such file or directory
    at makeError (file:///app/node_modules/.pnpm/[email protected]/node_modules/execa/lib/error.js:59:11)
    at handlePromise (file:///app/node_modules/.pnpm/[email protected]/node_modules/execa/index.js:119:26)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async executeCommand (/app/lib/common.js:701:12)
    at async copySSLCertificates (/app/index.js:475:5)
    at async Timeout._onTimeout (/app/index.js:156:7) {
  shortMessage: 'Command failed with exit code 1: find /tmp/ -maxdepth 1 -type f -name *-*.pem -delete',
  command: 'find /tmp/ -maxdepth 1 -type f -name *-*.pem -delete',
  escapedCommand: 'find "/tmp/" -maxdepth 1 -type f -name "*-*.pem" -delete',
  exitCode: 1,
  signal: undefined,
  signalDescription: undefined,
  stdout: '',
  stderr: "find: cannot delete '/tmp/cli3eta2k0000pg9s7q6xutq7-cert.pem': No such file or directory\n" +
    "find: cannot delete '/tmp/cli3eta2k0000pg9s7q6xutq7-key.pem': No such file or directory",
  failed: true,
  timedOut: false,
  isCanceled: false,
  killed: false
} { hide_meta: true }
 ELIFECYCLE  Command failed with exit code 1.

finally found some logs from coolify about this

Geczy avatar May 25 '23 17:05 Geczy

@danieldogeanu did you check coolify v3.12.32 on docker 23.0.4 ?

No, but I'm fairly sure it works. I will test this later or tomorrow.

Also, I don't think your problem is the same as ours. I have a problem with the entire server, all sites are down, none starts automatically, and receive a Bad Gateway error after updating. Redeploying and rebuilding works. I'm fairly sure Docker introduced some breaking changes in their latest version, which makes Coolify not work properly.

danieldogeanu avatar May 26 '23 07:05 danieldogeanu

yeah i solved mine by wiping the entire coolify instance, including docker volumes, and starting over, this time disabling custom cert and just letting coolify do letsencrypt generation

Geczy avatar May 26 '23 13:05 Geczy

I'm fairly sure Docker introduced some breaking changes in their latest version, which makes Coolify not work properly.

Same for me it would seem. @danieldogeanu by "Redeploying and rebuilding", do you mean restart coolify after the update and then rebuild all the apps?

narduin avatar Jun 02 '23 12:06 narduin

by "Redeploying and rebuilding", do you mean restart coolify after the update and then rebuild all the apps?

@narduin Yes, restart Coolify and the entire server, and manually redeploy all the apps, one by one (Force Redeploy button). But if you have Unattended Upgrades configured to auto-restart the server, like I have, they won't start automatically after. If you only do manual server restarts, you could just redeploy them and leave them like that.

danieldogeanu avatar Jun 03 '23 09:06 danieldogeanu

Any progress on this? I've tried updating my server again and it's still a problem! I've updated to Docker Engine 24.0.2 and it doesn't fix the issue. For now, I've restored the server again, and just updated Coolify to v3.12.32 and it works with Docker Engine 23.0.4 perfectly fine. But as soon as I update Docker, everything breaks again! Here are the details:


Working Version

Docker Engine Version 23.0.4:
Client: Docker Engine - Community
 Version:           23.0.4
 API version:       1.42
 Go version:        go1.19.8
 Git commit:        f480fb1
 Built:             Fri Apr 14 10:32:03 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.4
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.8
  Git commit:       cbce331
  Built:            Fri Apr 14 10:32:03 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.20
  GitCommit:        2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc:
  Version:          1.1.5
  GitCommit:        v1.1.5-0-gf19387a
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Contents of /etc/docker/daemon.json:
{
    "log-driver": "json-file",
    "log-opts": {
      "max-size": "100m",
      "max-file": "5"
    },
    "features": {
        "buildkit": true
    },
    "live-restore": true,
    "default-address-pools" : [
    {
      "base" : "172.17.0.0/12",
      "size" : 20
    },
    {
      "base" : "192.168.0.0/16",
      "size" : 24
    }
  ]
}

Broken Version

Docker Engine Version 24.0.2:
Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:51:00 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:51:00 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Contents of /etc/docker/daemon.json:
{
    "log-driver": "json-file",
    "log-opts": {
      "max-size": "100m",
      "max-file": "5"
    },
    "features": {
        "buildkit": true
    },
    "live-restore": true,
    "default-address-pools" : [
    {
      "base" : "172.17.0.0/12",
      "size" : 20
    },
    {
      "base" : "192.168.0.0/16",
      "size" : 24
    }
  ]
}

Both versions have the following:

  • Ubuntu Version: 22.04.2 LTS (GNU/Linux 5.15.0-73-generic x86_64).
  • Coolify Version: v3.12.32.

Is there anything more that I can provide to help fix this issue?

danieldogeanu avatar Jun 16 '23 18:06 danieldogeanu

Due to inactivity, I'm closing this issue. However, if the problem persists, please reopen the issue. Thanks!

andrasbacsai avatar Sep 08 '23 08:09 andrasbacsai