immich
immich copied to clipboard
[BUG] Interrupted mobile uploads leave corrupt files
The bug
Every few thousand uploads something goes wrong and the upload is stopped mid file. This results in errors during file processing. As a result these uploads are stuck on the untracked files section of the repair tab.
There are a few issues with this:
- Apparently partial uploads are not detected by the upload code. Given that some partial uploads might be able to get processed as if they were a correct file this means there is no way to know if a part of the library is corrupt.
- The files that are partial enough (not all!) to get stuck in the untracked files sections are stuck there. There is some (older?) documentation referring to a "Remove Offline Files" job that does not seem to exist for the default user library?
- Even though the server knows some files can't be processed (if your are lucky, sometimes it thinks the partials files are just fine). The mobile app incorrectly shows everything is just fine while it's not.
The most egregious issue to me seems to be issue number 1. Especially for broken uploads that don't get detected as corrupt. A relative easy fix for this would be to upload from the app not only the file, but also the checksum. And then only accept the file server side if the checksum checks out. If not, drop the upload and ask the app to try again. Or even simpler, upload to a example.jpg.partial file that is ignored by the server and rename it to example.jpg when it is done.
Update: Seems @ItalyPaleAle already ran into this wall once before. Not sure why https://github.com/immich-app/immich/issues/4532 was closed.
The OS that Immich Server is running on
Unraid v6.12.10
Version of Immich Server
v1.105.1
Version of Immich Mobile App
v1.105.0
Platform with the issue
- [X] Server
- [ ] Web
- [X] Mobile
Your docker-compose.yml content
# backup stack
networks:
default:
name: backup
services:
immich:
container_name: backup-immich
image: ghcr.io/immich-app/immich-server:release
command: ['start.sh', 'immich']
restart: always
depends_on:
- redis-immich
- postgres-immich
volumes:
- /mnt/user/immich:/usr/src/app/upload
- /etc/localtime:/etc/localtime:ro
environment:
TZ: Europe/Amsterdam
PUID: 99
PGID: 100
REDIS_HOSTNAME: redis-immich
DB_HOSTNAME: postgres-immich
DB_DATABASE_NAME: immich
DB_USERNAME: postgres
DB_PASSWORD: REDACTED
labels:
traefik.enable: true
traefik.http.services.immich-backup.loadbalancer.server.port: 3001
microservices-immich:
container_name: backup-immich-microservices
image: ghcr.io/immich-app/immich-server:release
command: ['start.sh', 'microservices']
restart: always
depends_on:
- redis-immich
- postgres-immich
volumes:
- /mnt/user/immich:/usr/src/app/upload
- /etc/localtime:/etc/localtime:ro
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["GPU-e1474488-7fa9-85a5-803b-59c645d71e0d"]
capabilities: [gpu]
environment:
TZ: Europe/Amsterdam
PUID: 99
PGID: 100
NVIDIA_DRIVER_CAPABILITIES: all
NVIDIA_VISIBLE_DEVICES: all
REDIS_HOSTNAME: redis-immich
DB_HOSTNAME: postgres-immich
DB_DATABASE_NAME: immich
DB_USERNAME: postgres
DB_PASSWORD: REDACTED
machinelearning-immich:
container_name: backup-immich-machinelearning
image: ghcr.io/immich-app/immich-machine-learning:release-cuda
restart: always
volumes:
- /mnt/user/cache/backup/immich/machinelearning:/cache
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["GPU-e1474488-7fa9-85a5-803b-59c645d71e0d"]
capabilities: [gpu]
environment:
TZ: Europe/Amsterdam
PUID: 99
PGID: 100
NVIDIA_DRIVER_CAPABILITIES: all
NVIDIA_VISIBLE_DEVICES: all
redis-immich:
container_name: backup-immich-redis
image: registry.hub.docker.com/library/redis:6.2-alpine@sha256:84882e87b54734154586e5f8abd4dce69fe7311315e2fc6d67c29614c8de2672
restart: always
volumes:
- /mnt/user/cache/backup/immich/redis:/data
environment:
TZ: Europe/Amsterdam
PUID: 99
PGID: 100
postgres-immich:
container_name: backup-immich-postgres
image: registry.hub.docker.com/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
restart: always
volumes:
- /mnt/user/containers/backup/immich-postgres:/var/lib/postgresql/data
environment:
TZ: Europe/Amsterdam
PUID: 99
PGID: 100
POSTGRES_INITDB_ARGS: '--data-checksums'
POSTGRES_DB: immich
POSTGRES_USER: postgres
POSTGRES_PASSWORD: REDACTED
Your .env content
In-lined into the compose file.
Reproduction steps
Attempt mobile uploads on a large number of files and mess with the connection by:
- Rapidly changing between foreground and background settings during the upload.
- Switch between WiFi and mobile networks.
- Kill and restart the app a few times.
Then check the server repair page. There will be some files.
Relevant log output
Can't find the relevant log anymore. But it was something generic about not being able to read the file. This is expected as the file has only been partially uploaded.
Additional information
This is a (intentionally very low res) screenshot of the two files compared. Left the uploaded file. Right the original file. You can clearly see how the file size of about 10% is reflected in the image.
Related as these three PRs seem to already contain 80% of the required code: https://github.com/immich-app/immich/pull/9306 https://github.com/immich-app/immich/pull/2072 https://github.com/immich-app/immich/pull/7135
This seems like a direct fit for the issue I am seeing and mentioned elsewhere.
I post specifically since the OP mentions Every few thousand uploads something goes wrong but this can be be made much worse under certain conditions.
In my case after returning from a family vacation to an area of the world with spotty slow internet and daily power cuts I ended up with 0.5TB of untracked files (mainly video) and a very real worry I do not have proper vacation photos and video backup.
Unless I am misinterpreting this issue it could arguably classed as "potential data loss" and if so this is an serious as it gets. Hopefully I am wrong about this.
@nomandera it is correct that you are misinterpreting this. Uninterrupted upload doesn't send the complete event to the server so the file will be reupload again
@alextran1502
@nomandera it is correct that you are misinterpreting this. Uninterrupted upload doesn't send the complete event to the server so the file will be reupload again
No, interrupted uploads lead to files that will never upload completely and stops very early on (a few MB uploaded each time, then errors out)
then it will try to infinitely reupload every time but will always fail
with error: Immich Backup error Failed to backup assets. Retrying...
if you try to do a manual backup, it will still fail:
@Snuupy that error seems to be from your reverse proxy, try local ip
@alextran1502 you're right, I connected on local port and it succeeded. I am now confused as to why the reverse proxy broke. Here is the swag config:
server {
listen 443 ssl;
listen [::]:443 ssl;
server_name immich.*;
include /config/nginx/ssl.conf;
client_max_body_size 50000M;
access_log off;
location / { # web
include /config/nginx/resolver.conf;
proxy_buffering off;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_redirect off;
# set timeout
proxy_read_timeout 600s;
proxy_send_timeout 600s;
send_timeout 600s;
set $upstream_app immich-server;
set $upstream_port 3001;
set $upstream_proto http;
proxy_pass $upstream_proto://$upstream_app:$upstream_port;
}
}
I will check the nginx proxy configs to see if I can find out why it's broken. Thanks.
@nomandera it is correct that you are misinterpreting this. Uninterrupted upload doesn't send the complete event to the server so the file will be reupload again
@alextran1502 excellent and thank you for the fast clarification. I am really glad I asked as it wasn't obvious to me reading all the discussion posts and from my symptoms that this would be the case and I suspect I wasn't alone in this interpretation. I will delete my 500GB of noise now with the confidence of knowing it is just leftovers and non unique.
Final question to close this off in my mind.
Does it follow then that if after cleanup of the leftover files, if they do not reoccur, we can say categorically that the client has subsequently successfully backed these up. Essentially I am just looking for a way to have confidence all my family clients have backed up i.e if I see new client images and no noise I can conclude they are fully backed up.
Reverse proxy issues aside, I still have the bug mentioned in this issue where backup data is corrupted while everything is showing green. I can even still reliably reproduce it.
This seems to directly contradict my assumptions so now I am even more confused. The volume of media is such that I cant manually spot check this is any meaningful way and I am worried.
Update:
I believe I located my prime offender that being my youngest kids phone seems so struggle to upload videos (its not the best phone). This was pretty impactful for me because I ended up with 700GB of corrupt videos stuck which was enough to fill a drive and cause havoc with docker dropping the whole server. If the solution to not accumulating files is complex to solve or far off can I suggest at the very least some sort of space check is added.
Update 2: This issue has not reoccurred for me since I "fixed" the childs phone.
Just checking in on this. Does anyone know what the current state of this issue is?
I was very much looking forward to deploying Immich, but any question of data corruption (and even worse, undetected data corruption) on a photo backup solution is an absolute show-stopper. Echoing @nomandera's comments, I can't imagine a worse failure mode.
Does Immich do any kind of checksumming/file verification?
For reference, this is how Nextcloud does it: https://help.nextcloud.com/t/does-the-nextcloud-client-add-checksum-verifications-when-uploading/193040/2
Thank you in advance.
The situation remains unchanged for me; the bug still occurs. It is easily testable as well. I simply sync everything with Immich and then compare the checksums of the folder on the server with the same folder synced by Syncthing. If I kill the app a few times, there are always some partially uploaded files.
I also wholeheartedly agree that silent data corruption is a worst-case scenario bug. It is the main, if not the only, blocker for me as well. That said, I respect the amount of work that goes into a project like Immich, so I will respectfully abide until a developer has time to fix this. When that happens, I can at least provide some thorough testing.
Oh certainly; I too am appreciative of @alextran1502's efforts (in case it's not clear, thank you, Alex). I'm only disappointed because I'm looking forward to using the app but with this kind of a bug, I simply cannot use it yet.
I was actually hoping you would respond, @SixFive7. As I reread your initial post/responses, it seems like the errors/corrupted uploads are in fact detected, you do see all affected files in the untracked errors section of the repair tab, correct?
Apparently partial uploads are not detected by the upload code. Given that some partial uploads might be able to get processed as if they were a correct file this means there is no way to know if a part of the library is corrupt.
The partial uploads are detected though, aren't they? In the sense that you can find them in the repair tab, right? So you could use this to determine if part of the library is corrupt, correct?
The files that are partial enough (not all!) to get stuck in the untracked files sections are stuck there. There is some (older?) documentation referring to a "Remove Offline Files" job that does not seem to exist for the default user library?
So not all partial/corrupted files show up in the untracked section? This is the scenario I am most worried about.
Like I said, I haven't yet deployed Immich, I just don't want to end up in a situation where I think my data is protected but it is not.
Thank you again everyone for your responses in advance!
We now cleanup these files
We now cleanup these files
@mmomjian Do you have a reference to the relevant code, documentation or release note? I can't find anything and I still have untracked files. All are from partial uploads. The most recent app + server version does seem to re-upload everything and my storage template moves all the good files out. But the partial files are left behind and not cleaned up. Any clue would be helpful!
@SixFive7 is this an older instance or new one? The left over files before the implement aren't cleaned up automatically
@alextran1502 This is a brand new setup. Server and client installed today. Thought I give Immich another go after this issue got closed. Uploaded 1.809 files incl. some video files whilst switching from mobile to wifi to airplane mode and resuming. Only 1 orphaned file. So it is an improvement. Also, the single untracked file seems to be a partial video upload of a file that is also sitting with a complete copy in my album. So the system seems to have retried the upload and succeeded. Only question remaining is if the failed upload will clean itself up some time?
@alextran1502 It has been almost half a year and the file has not been cleaned up yey. Immich has always been up to date and is now running v1.140.1 but the orphaned file is still there. Can we re-open the issue until the system auto cleans partial uploads?