immich
immich copied to clipboard
Microservices container hangs up with a certain constellation of transcoding settings and hardware
The bug
Immich is running in Docker, using a Proxmox LXC as host. Hardware acceleration is turned on and set up correctly. The CPU is AMD GX-415GA with Radeon HD8330E as the GPU.
Running "transcode all" with a certain combination of settings causes the microservices container to hang on a certain video. At this point, neither CPU nor GPU are utilized the way they should be when transcoding is in progress (htop shows low utilization, so nothing is running properly). The docker container can't be killed from inside the LXC, neither can the LXC be killed from the PVE host. The only way to get everything working as it should is to reboot the node. After rebooting, everything works fine as long as a transcoding job isn't started again, then everything repeats.
Transcoding configuration JSON:
{
"ffmpeg": {
"crf": 23,
"threads": 3,
"preset": "medium",
"targetVideoCodec": "h264",
"acceptedVideoCodecs": [
"h264"
],
"targetAudioCodec": "aac",
"acceptedAudioCodecs": [
"aac",
"mp3",
"libopus"
],
"targetResolution": "480",
"maxBitrate": "2000",
"bframes": -1,
"refs": 0,
"gopSize": 0,
"npl": 0,
"temporalAQ": false,
"cqMode": "auto",
"twoPass": false,
"preferredHwDevice": "auto",
"transcode": "bitrate",
"tonemap": "hable",
"accel": "vaapi"
},
A transcoding job has already run before without problems. From memory, the config differences were:
Preset: faster instead of medium Max bitrate: unset instead of 2000 Threads: unset instead of 3 Transcode policy: "only videos not in an accepted format" instead of "Videos higher than max bitrate or not in an accepted format"
I have modified the docker-compose file manually to store the Postgres DB in ./pgdata, this was before the breaking change with the docker-compose.yml. I have not made a docker compose pull since then.
The log pasted below is the last one to be seen before the container hangs. As mentioned before, nothing seems to actually be transcoded.
The OS that Immich Server is running on
Proxmox VE 8.1.5, LXC container with Debian 12
Version of Immich Server
v1.105.1
Version of Immich Mobile App
v1.105.1
Platform with the issue
- [X] Server
- [ ] Web
- [ ] Mobile
Your docker-compose.yml content
version: '3.8'
#
# WARNING: Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
#
name: immich
services:
immich-server:
container_name: immich_server
image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
command: ['start.sh', 'immich']
volumes:
- ${UPLOAD_LOCATION}:/usr/src/app/upload
- /etc/localtime:/etc/localtime:ro
env_file:
- .env
ports:
- 2283:3001
depends_on:
- redis
- database
restart: always
immich-microservices:
container_name: immich_microservices
image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/hardware-transcoding
file: hwaccel.transcoding.yml
service: vaapi # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
command: ['start.sh', 'microservices']
volumes:
- ${UPLOAD_LOCATION}:/usr/src/app/upload
- /etc/localtime:/etc/localtime:ro
env_file:
- .env
depends_on:
- redis
- database
restart: always
immich-machine-learning:
container_name: immich_machine_learning
# For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
# Example tag: ${IMMICH_VERSION:-release}-cuda
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
# extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
# file: hwaccel.ml.yml
# service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
volumes:
- model-cache:/cache
env_file:
- .env
restart: always
redis:
container_name: immich_redis
image: registry.hub.docker.com/library/redis:6.2-alpine@sha256:51d6c56749a4243096327e3fb964a48ed92254357108449cb6e23999c37773c5
restart: always
database:
container_name: immich_postgres
image: registry.hub.docker.com/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
environment:
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_USER: ${DB_USERNAME}
POSTGRES_DB: ${DB_DATABASE_NAME}
volumes:
- ./pgdata:/var/lib/postgresql/data
restart: always
volumes:
pgdata:
model-cache:
Your .env content
# You can find documentation for all the supported env variables at https://immich.app/docs/insta>
# The location where your uploaded files are stored
UPLOAD_LOCATION=./library
# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release
# Connection secret for postgres. You should change it to a random password
DB_PASSWORD=[PASSWORD]
# The values below this line do not need to be changed
###################################################################################
DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_DATABASE_NAME=immich
REDIS_HOSTNAME=immich_redis
Reproduction steps
1. Enable VAAPI in LXC, create the necessary permissions for the container
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.hook.pre-start: sh -c "chown 100000:111000 /dev/dri/renderD128"
lxc.cgroup2.devices.allow: c 235:0 rwm
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file
lxc.hook.pre-start: sh -c "chown 100000:111000 /dev/kfd"
2. Start a transcoding job with the settings above
3. The docker container locks up completely, PVE node needs to be restarted
Relevant log output
[Nest] 7 - 06/02/2024, 10:04:03 AM LOG [ImmichMicroservices] [MediaService] Started encoding video a5d278de-239c-4713-ba9e-a862024a525c {"inputOptions":["-init_hw_device vaapi=accel:/dev/dri/renderD128","-filter_hw_device accel"],"outputOptions":["-c:v h264_vaapi","-c:a copy","-movflags faststart","-fps_mode passthrough","-map 0:0","-map 0:1","-g 256","-v verbose","-vf format=nv12,hwupload,scale_vaapi=480:-2","-compression_level 4","-threads 3","-b:v 1380","-maxrate 2000","-minrate 690","-rc_mode 3"],"twoPass":false}
Additional information
No response
How much RAM does the server have? If CPU/GPU utilization is low and things are freezing, I wonder if a lack of RAM is causing it to slow to a crawl.
How much RAM does the server have? If CPU/GPU utilization is low and things are freezing, I wonder if a lack of RAM is causing it to slow to a crawl.
It has enough RAM, 8Gb to be exact, of which 4Gb are available to the LXC. However, there was no abnormal RAM usage, as I'd have noticed this in the Proxmox dashboard otherwise.
Can you confirm if this issue still happens when using CPU instead of VAAPI?
Well, not really -- the microservices container has been removed with the last update. However, by the very strange behaviour (neither Docker nor LXC can be terminated, not even with kill -9 unless the entire system is restarted), it seems that this is a problem with VAAPI and this particular GPU
I'm having similar hanging issues with ffmpeg sitting forever on really tiny video files, using vaapi on a RX 6700 XT. The machine in question has 128 GB of RAM. top shows the ffmpeg process is spinning on 100% of a core. Terminating the process emits some log that shows ffmpeg is spinning on frame 0 with no progress. I can try filing a separate issue if it's relevant.