immich icon indicating copy to clipboard operation
immich copied to clipboard

[BUG] ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-jpeg-thumbnail) and (library/library-refresh-asset)

Open nodis opened this issue 1 year ago • 6 comments

The bug

I deleted the container, re pulled and rebuilt the immich, mounted the external library, and initially everything was normal. When the scanned images exceeded 50000 (which is my estimated number), a serious error occurred, and the "immich_microservices" container kept restarting

[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Unable to run job handler (library/library-refresh-asset): Error: timeout exceeded when trying to connect
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/usr/src/app/node_modules/pg-pool/index.js:205:27)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Object:
{
  "id": "8b05f464-3703-4834-a886-89d97f1bbdda",
  "assetPath": "/xxxxx/IMG_20150704_232250_1440302612122.jpg",
  "ownerId": "92a7be28-90c5-467c-9b14-afca9xxxxxxbe",
  "force": false
}

[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Unable to run job handler (library/library-refresh-asset): Error: timeout exceeded when trying to connect
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/usr/src/app/node_modules/pg-pool/index.js:205:27)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Object:
{
  "id": "8b05f464-3703-4834-a886-89d97f1bbdda",
  "assetPath": "xxxxx/IMG_20150704_232250.jpg",
  "ownerId": "92a7be28-90c5-467c-9b14-afca9254ddbe",
  "force": false
}

[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Unable to run job handler (metadataExtraction/metadata-extraction): Error: timeout exceeded when trying to connect
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/usr/src/app/node_modules/pg-pool/index.js:205:27)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Object:
{
  "id": "7ee0de24-d2d2-45de-bdfe-4f74efb768b7",
  "source": "upload"
}

[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Unable to run job handler (metadataExtraction/metadata-extraction): Error: timeout exceeded when trying to connect
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/usr/src/app/node_modules/pg-pool/index.js:205:27)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Object:
{
  "id": "2d915491-d293-4296-8d5c-62a3b4427a5f",
  "source": "upload"
}

[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Unable to run job handler (objectTagging/classify-image): Error: timeout exceeded when trying to connect
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/usr/src/app/node_modules/pg-pool/index.js:205:27)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Object:
{
  "id": "64ffc93a-3a53-412d-9aa5-1d4c23b9e484"
}

[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Unable to run job handler (objectTagging/classify-image): Error: timeout exceeded when trying to connect
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/usr/src/app/node_modules/pg-pool/index.js:205:27)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)
[Nest] 7  - 11/03/2023, 4:22:16 PM   ERROR [JobService] Object:
{
  "id": "53fd3caa-d794-4ea6-bcf8-ad5c8bae7810"
}

[Nest] 7  - 11/03/2023, 4:22:17 PM   ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-jpeg-thumbnail): Error: timeout exceeded when trying to connect
[Nest] 7  - 11/03/2023, 4:22:17 PM   ERROR [JobService] Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/usr/src/app/node_modules/pg-pool/index.js:205:27)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)
[Nest] 7  - 11/03/2023, 4:22:17 PM   ERROR [JobService] Object:
{
  "id": "fde72f61-cb7e-422a-8e32-98a263576be8"
}

[Nest] 7  - 11/03/2023, 4:22:17 PM   ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-jpeg-thumbnail): Error: timeout exceeded when trying to connect
[Nest] 7  - 11/03/2023, 4:22:17 PM   ERROR [JobService] Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/usr/src/app/node_modules/pg-pool/index.js:205:27)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)
[Nest] 7  - 11/03/2023, 4:22:17 PM   ERROR [JobService] Object:

The OS that Immich Server is running on

Synology

Version of Immich Server

1.84

Version of Immich Mobile App

1.84

Platform with the issue

  • [X] Server
  • [ ] Web
  • [ ] Mobile

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: ["start.sh", "immich"]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
      - /volume1:/media/volume1:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # extends:
    #   file: hwaccel.yml
    #   service: hwaccel
    command: ["start.sh", "microservices"]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
      - /volume1:/media/volume1:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - /volume2/docker/immich/machine/model-cache:/cache
    env_file:
      - .env
    restart: always

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release}
    env_file:
      - .env
    restart: always

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.1@sha256:9bcff2b829f12074426ca044b56160ca9d777a0c488303469143dd9f8259d4dd
    # image: typesense/typesense:0.25.1
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
      # remove this to get debug messages
      - GLOG_minloglevel=1
    volumes:
      - /volume2/docker/immich/tsdata:/data
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine@sha256:70a7a5b641117670beae0d80658430853896b5ef269ccf00d1827427e3263fa3
    restart: always

  database:
    container_name: immich_postgres
    image: postgres:14-alpine@sha256:28407a9961e76f2d285dc6991e8e48893503cc3836a4755bbc2d40bcc272a441
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
    volumes:
      - /volume2/docker/immich/pgdata:/var/lib/postgresql/data
    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release}
    ports:
      - 2283:8080
    depends_on:
      - immich-server
      - immich-web
    restart: always

volumes:
  pgdata:
  model-cache:
  tsdata:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=/volume1/Immich

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secrets for postgres and typesense. You should change these to random passwords
TYPESENSE_API_KEY=xxxxxxNqo26NFRsFBmYymuv+KLcGbNBSKHQE99w2QD8=
DB_PASSWORD=xxxxxxxx

# The values below this line do not need to be changed
###################################################################################
DB_HOSTNAME=immich_postgres
DB_USERNAME=xxxxx
DB_DATABASE_NAME=immich

REDIS_HOSTNAME=immich_redis

Reproduction steps

1.
2.
3.
...

Additional information

No response

nodis avatar Nov 03 '23 16:11 nodis

After upgrading to 1.85.0, this issue still exists

nodis avatar Nov 08 '23 13:11 nodis

After upgrading to 1.86.0, this issue still exists

nodis avatar Nov 17 '23 03:11 nodis

@nodis can you try to reduce the concurrency of all jobs to 1 and see if this issue still occurs?

alextran1502 avatar Nov 17 '23 03:11 alextran1502

I am not the author of this ticket but i had the similar issue

Error: Machine learning request to "http://immich-machine-learning:3003" failed with ConnectTimeoutError: Connect Timeout Error

It stopped when i set the concurrency for Smart Search and Face Detection to 1.

myxor avatar Feb 02 '24 08:02 myxor

I can confirm this; once concurrency setting for "Face Detection Concurrency" is set back to "1" (default), error messages disappear in log files.

bitforker avatar Mar 11 '24 07:03 bitforker

I face the same error on Immich v1.99.0 with pg-pool from time to time. If my DB is restarted or CPU load is very high on server, Immich DB pool will log errors that DB connection is timed out, but it would never reconnect to the database by itself, I have to restart pod manually to get rid of 500 errors in browser.

Lowering concurrency has helped me only in the way that I receive such errors less often, but it's not helped to get rid of them completely. I think PG connections can leak when load on the server is high. They should invalidate and reconnect by itself.

[Nest] 13  - 03/21/2024, 12:52:41 PM   ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-webp-thumbnail): Error: timeout exceeded when trying to connect
[Nest] 13  - 03/21/2024, 12:52:41 PM   ERROR [JobService] Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/usr/src/app/node_modules/pg-pool/index.js:205:27)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)
[Nest] 13  - 03/21/2024, 12:52:41 PM   ERROR [JobService] Object:
{
  "id": "f6543fe8-adc0-40d0-91c4-1c26cea86009"
}

... many similar errors below

UPD: seems TypeORM doesn't supports reconnection to DB if pool connections are dead. Can we incorporate this solution to reconnect to the database on errors?

maksimkurb avatar Mar 21 '24 11:03 maksimkurb