rsync
rsync copied to clipboard
bug: rsync backup of Immich data stalls, leads to zombie process, and memory pressure
Describe the bug
- Attempting to back up the Immich data directory with rsync results in the process stalling indefinitely.
- Rsync freezes at 0% (xfr#0, ir-chk=1003/1402) for hours.
- Killing the process leaves behind a zombie [rsync]
child. - During long stalls, Portainer and other containers stopped unexpectedly, and system memory dropped to under 1 GiB free (likely due to page cache exhaustion or OOM events).
- Only a fraction of the data (e.g., backups/ and part of encoded-video/) was copied before stalling; large directories like library/ (~325 GB) and thumbs/ (~5 GB) were untouched.
To Reproduce
Steps to reproduce:
-
Stop Immich container(s).
-
Run backup on low ram:
rsync -avh --progress /media/files/photos/immich/ /media/files/photos-bk/
- Observe that rsync starts transferring a few files, then halts.
Expected behavior
- Rsync should either complete the backup or error out cleanly.
- Processes should terminate fully without leaving zombies.
- System services (like Portainer) should not be disrupted by a userland file copy.
Actual behavior
- Rsync stalls during directory traversal.
- Zombie processes remain even after termination attempts.
- System shows reduced available memory, and containers like Portainer crash/stop.
Some Logs & Output
Resume command I tried
rsync -aH --info=progress2 --partial --append-verify \
/media/files/photos/immich/ /media/files/photos-bk/
32,768 0% 0.00kB/s 0:00:00
which I let sit over night so about 8 hours and nothing
checking if rsync is working and force killing it
mike@markvm2:~$ ps -o pid,ppid,stat,etime,cmd -p 2778605,2778606
PID PPID STAT ELAPSED CMD
2778605 7461 R+ 13:42:13 rsync -aH --info=progress2 --partial --append-verify /media/files/photos/immich/ /media/files/photos-bk/
2778606 2778605 Z+ 13:42:13 [rsync] <defunct>
mike@markvm2:~$ sudo kill -KILL 2778605
Checking space of drive
df -h /media/files/photos/immich /media/files/photos-bk
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.1T 2.8T 6.4T 31% /media/files
/dev/sda1 9.1T 2.8T 6.4T 31% /media/files
Compare sizes
mike@markvm2:~$ du -sh /media/files/photos/immich/* | sort -h
136K /media/files/photos/immich/profile
300K /media/files/photos/immich/upload
1.3G /media/files/photos/immich/backups
4.4G /media/files/photos/immich/encoded-video
5.0G /media/files/photos/immich/thumbs
325G /media/files/photos/immich/library
mike@markvm2:~$ du -sh /media/files/photos-bk/* | sort -h
0 /media/files/photos-bk/library
0 /media/files/photos-bk/profile
0 /media/files/photos-bk/thumbs
0 /media/files/photos-bk/upload
1.3G /media/files/photos-bk/backups
4.3G /media/files/photos-bk/encoded-video
Environment
OS: Debian 12 (Bookworm)
Rsync version: 3.2.7
Immich data directory size: ~335 GB
Filesystem: btrfs
After restarting this is my ram before running the command
free -h
total used free shared buff/cache available
Mem: 17Gi 4.9Gi 5.2Gi 90Mi 7.5Gi 12Gi
Swap: 974Mi 0B 974Mi
Here is after starting the command about a minute 1.30
free -h
total used free shared buff/cache available
Mem: 17Gi 4.9Gi 161Mi 90Mi 12Gi 12Gi
Swap: 974Mi 12Ki 974Mi
Now about 5 minutes in
free -h
total used free shared buff/cache available
Mem: 17Gi 4.8Gi 251Mi 90Mi 12Gi 12Gi
Swap: 974Mi 22Mi 952Mi
Proxmox Dash for the VM
Ram usage from containers
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
7d68d3dc682c open-webui 0.10% 746.8MiB / 17.19GiB 4.24% 124kB / 64.6kB 574MB / 193kB 28
931a7a6019a7 recipe 0.05% 402.1MiB / 17.19GiB 2.28% 136kB / 36.7kB 163MB / 40.7MB 7
85942b12a1a0 event-rallly-1 0.00% 211.3MiB / 17.19GiB 1.20% 47.8kB / 5.3kB 163MB / 889kB 20
442120c32b51 crm2-worker 0.05% 599.7MiB / 17.19GiB 3.41% 7.16MB / 9.29MB 161MB / 73.7kB 29
a636794f40f4 crm2 0.01% 509.1MiB / 17.19GiB 2.89% 1.7MB / 616kB 135MB / 4.96MB 11
5f1725d43a57 crm2-db 0.00% 37.92MiB / 17.19GiB 0.22% 1.72MB / 2.15MB 14.4MB / 471kB 9
a981c94de164 crm2-redis 0.10% 9.957MiB / 17.19GiB 0.06% 8.21MB / 6.66MB 721kB / 4.9MB 5
ee06a26cb1f3 booking-studio-1 0.00% 163.7MiB / 17.19GiB 0.93% 23.8kB / 4.17kB 90MB / 184kB 30
009cca8814f9 database-cal 0.00% 24.77MiB / 17.19GiB 0.14% 11.3kB / 0B 4.92MB / 119kB 6
1c98d3a1756d n8n 0.11% 329.3MiB / 17.19GiB 1.87% 18.5kB / 8.44kB 186MB / 47.4MB 20
790ea0b54427 ghost 0.00% 159.4MiB / 17.19GiB 0.91% 106kB / 22kB 117MB / 45.1kB 11
dfbe27e76836 ghost-db 3.77% 405.3MiB / 17.19GiB 2.30% 31.2kB / 76.2kB 98.7MB / 70MB 46
3786db9ce5e0 outline 0.01% 616.7MiB / 17.19GiB 3.50% 1.89MB / 4MB 184MB / 8.19kB 36
2c1b42682fb9 outline-db 0.00% 24.43MiB / 17.19GiB 0.14% 87.8kB / 142kB 5.14MB / 311kB 6
f2a536d70512 outline-redis 0.10% 14.19MiB / 17.19GiB 0.08% 3.93MB / 1.72MB 9.64MB / 12.3kB 5
e2479bd3b2bb recipe-db 0.00% 44.03MiB / 17.19GiB 0.25% 48.2kB / 125kB 34.8MB / 303kB 6
0bb6f7ecbb47 event-rallly_db-1 0.00% 46.2MiB / 17.19GiB 0.26% 16.6kB / 25.4kB 45.2MB / 2.92MB 8
0cd3c3b33a51 mixpost-redis-1 0.00% 0B / 0B 0.00% 0B / 0B 0B / 0B 0
c7a9d770d3b8 gitea4 0.03% 93.32MiB / 17.19GiB 0.53% 28.4kB / 21.8kB 5.77MB / 28.7kB 16
7a6d113b73ce test-os 0.00% 4.062MiB / 17.19GiB 0.02% 11.5kB / 0B 4.7MB / 0B 1
e8ea78f111d3 portainer 0.00% 39.22MiB / 15.29GiB 0.25% 346kB / 2.98MB 54.7MB / 2.57MB 10
Also restarting did work.