running tasks are not power-loss / kernelpanic safe.
I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.
- [x] Yes
The bug
sadly my hetzner server sometimes force reboots due to too high cpu temperature. I'm on it, but I noticed immich being unable to have transisition safe states. tasks should have a "is generating" state in db, to re-initiate unfinished tasks on reboot, before actually inserting faulty data into the live environment.
The OS that Immich Server is running on
Arch via docker compose
Version of Immich Server
v1.134.0
Version of Immich Mobile App
irrelevant
Platform with the issue
- [x] Server
- [ ] Web
- [ ] Mobile
Your docker-compose.yml content
#
# WARNING: To install Immich, follow our guide: https://immich.app/docs/install/docker-compose
#
# Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
name: immich
services:
immich-server:
container_name: immich_server
image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
# extends:
# file: hwaccel.transcoding.yml
# service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
volumes:
# Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file
- ${UPLOAD_LOCATION}:/usr/src/app/upload
- /etc/localtime:/etc/localtime:ro
env_file:
- .env
# ports:
# - '2283:2283'
networks:
- immich
- reverseproxy
depends_on:
- redis
- database
restart: unless-stopped
healthcheck:
disable: false
immich-machine-learning:
container_name: immich_machine_learning
# For hardware acceleration, add one of -[armnn, cuda, rocm, openvino, rknn] to the image tag.
# Example tag: ${IMMICH_VERSION:-release}-cuda
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
# extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
# file: hwaccel.ml.yml
# service: cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl, rknn] for accelerated inference - use the `-wsl` version for WSL2 where applicable
networks:
- immich
volumes:
- model-cache:/cache
env_file:
- .env
restart: unless-stopped
healthcheck:
disable: false
redis:
container_name: immich_redis
image: docker.io/valkey/valkey:8-bookworm@sha256:ff21bc0f8194dc9c105b769aeabf9585fea6a8ed649c0781caeac5cb3c247884
networks:
- immich
healthcheck:
test: redis-cli ping || exit 1
restart: unless-stopped
database:
container_name: immich_postgres
image: ghcr.io/immich-app/postgres:14-vectorchord0.3.0-pgvectors0.2.0@sha256:fa4f6e0971f454cd95fec5a9aaed2ed93d8f46725cc6bc61e0698e97dba96da1
networks:
- immich
environment:
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_USER: ${DB_USERNAME}
POSTGRES_DB: ${DB_DATABASE_NAME}
POSTGRES_INITDB_ARGS: '--data-checksums'
# Uncomment the DB_STORAGE_TYPE: 'HDD' var if your database isn't stored on SSDs
# DB_STORAGE_TYPE: 'HDD'
volumes:
# Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file
- ${DB_DATA_LOCATION}:/var/lib/postgresql/data
restart: unless-stopped
volumes:
model-cache:
networks:
reverseproxy:
name: reverseproxy
external: true
immich:
name: immich
Your .env content
# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables
# The location where your uploaded files are stored
UPLOAD_LOCATION=/mnt/storage/immich/library
# The location where your database files are stored. Network shares are not supported for the database
DB_DATA_LOCATION=./db
# To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List
TZ=Europe/Berlin
# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release
# Connection secret for postgres. You should change it to a random password
# Please use only the characters `A-Za-z0-9`, without special characters or spaces
DB_PASSWORD=redactedForObviousReason
# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich
Reproduction steps
- start file import via mobile app or api with a couple hundred pictures
- wait until tasks start generating previews etc.
- cut power / initiate kernel panic
- reboot server
- browse immich web. -> broken thumbnails and previews, only replacable by deletion and re-upload
Relevant log output
cutting server power produces no logs.
Additional information
No response
Is the redis container state being lost on reboot? If so you might want to mount a volume for it.
only replacable by deletion and re-upload
This is probably not the case btw. There's buttons in the top-right menu to refresh jobs for an asset, or you can run jobs in bulk from the admin panel.
only replacable by deletion and re-upload
This is probably not the case btw. There's buttons in the top-right menu to refresh jobs for an asset, or you can run jobs in bulk from the admin panel.
na. the button doesn't do anything. preview stayed broken. does it need browser cache invalidation? I didn't check for that.
Is the redis container state being lost on reboot? If so you might want to mount a volume for it.
well.. it's the recommended conf with docker.io/valkey/valkey:8-bookworm so.. I suppose it would be wise to consider adding support for persistence in the default config then by either appendonly or another mechanic of redis
I'm wondering if a race condition could come up.. is the "in progress" state syncronized to storage with waiting for completion? otherwise the power loss would cause the same issue. it has to be blocking IO or otherwise it's not doing it's job.
it's a docker volume btw so persistence exists across reboots.
update:
the previews don't generate, cause the file uploads caused a weird state of brokenness:
in essence, the raw files have 0 bytes, while the database has metadata and checksums about them.
I don't see any way that could happen from Immich's end - I suspect it's your filesystem causing problems.
I don't see any way that could happen from Immich's end - I suspect it's your filesystem causing problems.
sorry I've been flat-lined for the last few days with a intense cold so.. here we go:
the database is hosted on a nvme raid 0, so obviously, anything written to the database pretty much persists instantly.
the photo library is stored on a raid 0 with two hdd's. in a sense, slow as f*ck to persist anything.
now considering people detection etc, ran through, before the original image was sucessfully persisted, I'd suggest adding a mechanic, that upon immich startup checks the latest images persistence, rather than implying they already exist, when trying to re-upload.
dead images don't allow override as long as they exist in database.
I don't think this is something that we're going to investigate or fix. We have some things in place to help the user manage such situations though:
- Immich doesn't create a record in the database until the entire file is uploaded and the checksum validated (to be unique) - Jobs are indeed not cleared or automatically restarted/reset on boot, this is by design
- There is an entire administration screen on web which enables the user to resolve these types of issues on their own
- Jobs are automatically scheduled to run at night which will generate thumbnails for assets that are missing them, and other database related clean-up tasks
- #12293 to track generate "integrity" features, which the goal of being able to detect when the contents of the file on disk are changed/missing, including information about generated thumbnails.