twenty icon indicating copy to clipboard operation
twenty copied to clipboard

Database becomes unreachable / nameresolution fails with docker-compose.yml

Open JuliDi opened this issue 6 months ago • 0 comments

Bug Description

When setting up twenty with the provided docker-compose.yml, everything starts out working as expected. But after some time, the database starts to become unreachable every now and then (DNS nameresolution seems to fail), leading to error messages on the website. However, most of the time, twenty recovers from this (the logs do not indicate a crash/restart of postgres, so it appears to be only related to DNS).

Eventually, these name resolution failures lead to a crash of the entire service and the site becomes unreachable (containers seem to have died then). I then have to run podman-compose down and podman-compose up to make it work again. So it seems not like a configuration error per se, but rather like something crashing at runtime.

Log Example:

ecc77284652f Exception Captured
ecc77284652f   { user: undefined }
ecc77284652f   [
ecc77284652f     Error: getaddrinfo ENOTFOUND twenty-db
ecc77284652f         at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:108:26) {
ecc77284652f       errno: -3008,
ecc77284652f       code: 'ENOTFOUND',
ecc77284652f       syscall: 'getaddrinfo',
ecc77284652f       hostname: 'twenty-db'
ecc77284652f     }
ecc77284652f   ]

Edit: the above exception is one that allows the service to keep running. Here is a log example for the unhandled case that causes the service to become unreachable:

50b7515e958b 2024-08-07 21:03:50.324 GMT [75] LOG:  checkpoint complete: wrote 14 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.308 s, sync=0.004 s, total=1.316 s; sync files=11, longest=0.002 s, average=0.001 s; distance=77 kB, estimate=99 kB
f8fc7d0e40ee node:internal/errors:496
f8fc7d0e40ee     ErrorCaptureStackTrace(err);
f8fc7d0e40ee     ^
f8fc7d0e40ee
f8fc7d0e40ee Error [ERR_UNHANDLED_ERROR]: Unhandled error. ({
f8fc7d0e40ee   errno: -3008,
f8fc7d0e40ee   code: 'ENOTFOUND',
f8fc7d0e40ee   syscall: 'getaddrinfo',
f8fc7d0e40ee   hostname: 'twenty-db',
f8fc7d0e40ee   message: 'getaddrinfo ENOTFOUND twenty-db (Queue: __pgboss__send-it, Worker: 2ded992d-6008-47c0-80c1-c97a5a4637f0)',
f8fc7d0e40ee   stack: 'Error: getaddrinfo ENOTFOUND twenty-db (Queue: __pgboss__send-it, Worker: 2ded992d-6008-47c0-80c1-c97a5a4637f0)\n' +
f8fc7d0e40ee     '    at /app/node_modules/pg-pool/index.js:45:11\n' +
f8fc7d0e40ee     '    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n' +
f8fc7d0e40ee     '    at async Db.executeSql (/app/node_modules/pg-boss/src/db.js:28:14)\n' +
f8fc7d0e40ee     '    at async Manager.fetch (/app/node_modules/pg-boss/src/manager.js:497:16)\n' +
f8fc7d0e40ee     '    at async Worker.start (/app/node_modules/pg-boss/src/worker.js:49:22)',
f8fc7d0e40ee   queue: '__pgboss__send-it',
f8fc7d0e40ee   worker: '2ded992d-6008-47c0-80c1-c97a5a4637f0'
f8fc7d0e40ee })
f8fc7d0e40ee     at new NodeError (node:internal/errors:405:5)
f8fc7d0e40ee     at PgBoss.emit (node:events:503:17)
f8fc7d0e40ee     at PgBoss.emit (node:domain:489:12)
f8fc7d0e40ee     at Manager.<anonymous> (/app/node_modules/pg-boss/src/index.js:88:37)
f8fc7d0e40ee     at Manager.emit (node:events:514:28)
f8fc7d0e40ee     at Manager.emit (node:domain:489:12)
f8fc7d0e40ee     at Worker.onError (/app/node_modules/pg-boss/src/manager.js:256:12)
f8fc7d0e40ee     at Worker.start (/app/node_modules/pg-boss/src/worker.js:70:14)
f8fc7d0e40ee     at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
f8fc7d0e40ee   code: 'ERR_UNHANDLED_ERROR',
f8fc7d0e40ee   context: {
f8fc7d0e40ee     errno: -3008,
f8fc7d0e40ee     code: 'ENOTFOUND',
f8fc7d0e40ee     syscall: 'getaddrinfo',
f8fc7d0e40ee     hostname: 'twenty-db',
f8fc7d0e40ee     message: 'getaddrinfo ENOTFOUND twenty-db (Queue: __pgboss__send-it, Worker: 2ded992d-6008-47c0-80c1-c97a5a4637f0)',
f8fc7d0e40ee     stack: 'Error: getaddrinfo ENOTFOUND twenty-db (Queue: __pgboss__send-it, Worker: 2ded992d-6008-47c0-80c1-c97a5a4637f0)\n' +
f8fc7d0e40ee       '    at /app/node_modules/pg-pool/index.js:45:11\n' +
f8fc7d0e40ee       '    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n' +
f8fc7d0e40ee       '    at async Db.executeSql (/app/node_modules/pg-boss/src/db.js:28:14)\n' +
f8fc7d0e40ee       '    at async Manager.fetch (/app/node_modules/pg-boss/src/manager.js:497:16)\n' +
f8fc7d0e40ee       '    at async Worker.start (/app/node_modules/pg-boss/src/worker.js:49:22)',
f8fc7d0e40ee     queue: '__pgboss__send-it',
f8fc7d0e40ee     worker: '2ded992d-6008-47c0-80c1-c97a5a4637f0'
f8fc7d0e40ee   }
f8fc7d0e40ee }
f8fc7d0e40ee
f8fc7d0e40ee Node.js v18.17.1

Expected behavior

The DNS Resolution works at all times and temporary DNS resultion errors do not lead to a full crash or the site becoming unreachable.

Technical inputs

I am using podman instead of docker and run the services in a rootless environment. There are several other docker-compose-based services on the same server that use postgres or other databases. Their configuration is essentially the same, so it does not appear to be a principal error with my setup or the DNS resolution inside podman.

However, the other services use the postgres image directly (e.g., docker.io/postgres:13.1-alpine) and not a customized bitnami image like twenty.

The issue orccurs using the latest tag on the docker images (i.e., v0.23) and on v0.22.

docker-compose.yml (redacted where necessary)

version: "3.9"
name: twenty

services:
  change-vol-ownership:
    image: docker.io/ubuntu
    user: root
    volumes:
      - /containers/twenty-crm/data:/data
      - /containers/twenty-crm/server-local-data:/tmp/server-local-data
      - /containers/twenty-crm/docker-data:/tmp/docker-data
      - /containers/twenty-crm/db-data:/tmp/db-data
    command: >
      bash -c "
      chown -R 1000:1000 /tmp/server-local-data
      && chown -R 1000:1000 /tmp/docker-data
      && chown -R 1001:1001 /tmp/db-data"

  server:
    image: docker.io/twentycrm/twenty:v0.22
    volumes:
      - /containers/twenty-crm/server-local-data:/app/packages/twenty-server/.local-storage
      - /containers/twenty-crm/docker-data:/app/docker-data
    ports:
      - "127.0.0.1:xxxx:3000"
    environment:
      PORT: 3000
      PG_DATABASE_URL: postgres://twenty:twenty@twenty-db:5432/default
      SERVER_URL: "https://crm.example.com"
      FRONT_BASE_URL: "https://crm.example.com"
      MESSAGE_QUEUE_TYPE: "pg-boss"

      ENABLE_DB_MIGRATIONS: "true"

      SIGN_IN_PREFILLED: "true"
      STORAGE_TYPE: "local"

      ACCESS_TOKEN_SECRET: "redacted"
      LOGIN_TOKEN_SECRET: "redacted"
      REFRESH_TOKEN_SECRET: "redacted"
      FILE_TOKEN_SECRET: "redacted"
    depends_on:
      change-vol-ownership:
        condition: service_completed_successfully
      db:
        condition: service_healthy
    healthcheck:
      test: curl --fail http://localhost:3000/healthz
      interval: 5s
      timeout: 10s
      retries: 20
    restart: always

  worker:
    image: docker.io/twentycrm/twenty:v0.22
    command: ["yarn", "worker:prod"]
    environment:
      PG_DATABASE_URL: postgres://twenty:twenty@twenty-db:5432/default
      SERVER_URL: "https://crm.example.com"
      FRONT_BASE_URL: "https://crm.example.com"
      MESSAGE_QUEUE_TYPE: "pg-boss"

      ENABLE_DB_MIGRATIONS: "false" # it already runs on the server

      STORAGE_TYPE: "local"

      ACCESS_TOKEN_SECRET: "redacted"
      LOGIN_TOKEN_SECRET: "redacted"
      REFRESH_TOKEN_SECRET: "redacted"
      FILE_TOKEN_SECRET: "redacted"

      EMAIL_SMTP_HOST: "redacted"
      EMAIL_SMTP_PORT: "redacted"
      EMAIL_SMTP_USER: "redacted"
      EMAIL_SMTP_PASSWORD: "redacted"
      EMAIL_FROM_NAME: "redacted"
      EMAIL_FROM_ADDRESS: "redacted"
    depends_on:
      twenty-db:
        condition: service_healthy
      server:
        condition: service_healthy
    restart: always

  twenty-db:
    image: docker.io/twentycrm/twenty-postgres:v0.22
    volumes:
      - /containers/twenty-crm/db-data:/bitnami/postgresql
    depends_on:
      change-vol-ownership:
        condition: service_completed_successfully
    environment:
      POSTGRES_PASSWORD: "redacted"
    healthcheck:
      test: pg_isready -U twenty -d default
      interval: 5s
      timeout: 10s
      retries: 20
    restart: always

JuliDi avatar Aug 07 '24 15:08 JuliDi