immich [BUG] Interrupted mobile uploads leave corrupt files

The bug

Every few thousand uploads something goes wrong and the upload is stopped mid file. This results in errors during file processing. As a result these uploads are stuck on the untracked files section of the repair tab.

There are a few issues with this:

Apparently partial uploads are not detected by the upload code. Given that some partial uploads might be able to get processed as if they were a correct file this means there is no way to know if a part of the library is corrupt.
The files that are partial enough (not all!) to get stuck in the untracked files sections are stuck there. There is some (older?) documentation referring to a "Remove Offline Files" job that does not seem to exist for the default user library?
Even though the server knows some files can't be processed (if your are lucky, sometimes it thinks the partials files are just fine). The mobile app incorrectly shows everything is just fine while it's not.

The most egregious issue to me seems to be issue number 1. Especially for broken uploads that don't get detected as corrupt. A relative easy fix for this would be to upload from the app not only the file, but also the checksum. And then only accept the file server side if the checksum checks out. If not, drop the upload and ask the app to try again. Or even simpler, upload to a example.jpg.partial file that is ignored by the server and rename it to example.jpg when it is done.

Update: Seems @ItalyPaleAle already ran into this wall once before. Not sure why https://github.com/immich-app/immich/issues/4532 was closed.

The OS that Immich Server is running on

Unraid v6.12.10

Version of Immich Server

v1.105.1

Version of Immich Mobile App

v1.105.0

Platform with the issue

[X] Server
[ ] Web
[X] Mobile

Your docker-compose.yml content

# backup stack

networks:
  default:
    name: backup
services:

  immich:
    container_name: backup-immich
    image: ghcr.io/immich-app/immich-server:release
    command: ['start.sh', 'immich']
    restart: always
    depends_on:
      - redis-immich
      - postgres-immich
    volumes:
      - /mnt/user/immich:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    environment:
      TZ: Europe/Amsterdam
      PUID: 99
      PGID: 100
      REDIS_HOSTNAME: redis-immich
      DB_HOSTNAME: postgres-immich
      DB_DATABASE_NAME: immich
      DB_USERNAME: postgres
      DB_PASSWORD: REDACTED
    labels:
      traefik.enable: true
      traefik.http.services.immich-backup.loadbalancer.server.port: 3001

  microservices-immich:
    container_name: backup-immich-microservices
    image: ghcr.io/immich-app/immich-server:release
    command: ['start.sh', 'microservices']
    restart: always
    depends_on:
      - redis-immich
      - postgres-immich
    volumes:
      - /mnt/user/immich:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["GPU-e1474488-7fa9-85a5-803b-59c645d71e0d"]
              capabilities: [gpu]
    environment:
      TZ: Europe/Amsterdam
      PUID: 99
      PGID: 100
      NVIDIA_DRIVER_CAPABILITIES: all
      NVIDIA_VISIBLE_DEVICES: all
      REDIS_HOSTNAME: redis-immich
      DB_HOSTNAME: postgres-immich
      DB_DATABASE_NAME: immich
      DB_USERNAME: postgres
      DB_PASSWORD: REDACTED

  machinelearning-immich:
    container_name: backup-immich-machinelearning
    image: ghcr.io/immich-app/immich-machine-learning:release-cuda
    restart: always
    volumes:
      - /mnt/user/cache/backup/immich/machinelearning:/cache
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["GPU-e1474488-7fa9-85a5-803b-59c645d71e0d"]
              capabilities: [gpu]
    environment:
      TZ: Europe/Amsterdam
      PUID: 99
      PGID: 100
      NVIDIA_DRIVER_CAPABILITIES: all
      NVIDIA_VISIBLE_DEVICES: all

  redis-immich:
    container_name: backup-immich-redis
    image: registry.hub.docker.com/library/redis:6.2-alpine@sha256:84882e87b54734154586e5f8abd4dce69fe7311315e2fc6d67c29614c8de2672
    restart: always
    volumes:
      - /mnt/user/cache/backup/immich/redis:/data
    environment:
      TZ: Europe/Amsterdam
      PUID: 99
      PGID: 100

  postgres-immich:
    container_name: backup-immich-postgres
    image: registry.hub.docker.com/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    restart: always
    volumes:
      - /mnt/user/containers/backup/immich-postgres:/var/lib/postgresql/data
    environment:
      TZ: Europe/Amsterdam
      PUID: 99
      PGID: 100
      POSTGRES_INITDB_ARGS: '--data-checksums'
      POSTGRES_DB: immich
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: REDACTED

Your .env content

In-lined into the compose file.

Reproduction steps

Attempt mobile uploads on a large number of files and mess with the connection by:

- Rapidly changing between foreground and background settings during the upload.
- Switch between WiFi and mobile networks.
- Kill and restart the app a few times.

Then check the server repair page. There will be some files.

Relevant log output

Can't find the relevant log anymore. But it was something generic about not being able to read the file. This is expected as the file has only been partially uploaded.

Additional information

This is a (intentionally very low res) screenshot of the two files compared. Left the uploaded file. Right the original file. You can clearly see how the file size of about 10% is reflected in the image.

Jun 03 '24 15:06 SixFive7

Related as these three PRs seem to already contain 80% of the required code: https://github.com/immich-app/immich/pull/9306 https://github.com/immich-app/immich/pull/2072 https://github.com/immich-app/immich/pull/7135

Jun 03 '24 15:06 SixFive7

This seems like a direct fit for the issue I am seeing and mentioned elsewhere.

I post specifically since the OP mentions Every few thousand uploads something goes wrong but this can be be made much worse under certain conditions.

In my case after returning from a family vacation to an area of the world with spotty slow internet and daily power cuts I ended up with 0.5TB of untracked files (mainly video) and a very real worry I do not have proper vacation photos and video backup.

Unless I am misinterpreting this issue it could arguably classed as "potential data loss" and if so this is an serious as it gets. Hopefully I am wrong about this.

Aug 05 '24 22:08 nomandera

@nomandera it is correct that you are misinterpreting this. Uninterrupted upload doesn't send the complete event to the server so the file will be reupload again

Aug 05 '24 23:08 alextran1502

@alextran1502

@nomandera it is correct that you are misinterpreting this. Uninterrupted upload doesn't send the complete event to the server so the file will be reupload again

No, interrupted uploads lead to files that will never upload completely and stops very early on (a few MB uploaded each time, then errors out)

then it will try to infinitely reupload every time but will always fail

with error: Immich Backup error Failed to backup assets. Retrying...

if you try to do a manual backup, it will still fail: Screenshot_20240805-220319

Aug 06 '24 05:08 Snuupy

@Snuupy that error seems to be from your reverse proxy, try local ip

Aug 06 '24 05:08 alextran1502

@alextran1502 you're right, I connected on local port and it succeeded. I am now confused as to why the reverse proxy broke. Here is the swag config:

server {
    listen 443 ssl;
    listen [::]:443 ssl;

    server_name immich.*;

    include /config/nginx/ssl.conf;

    client_max_body_size 50000M;

    access_log off;

    location / { # web
        include /config/nginx/resolver.conf;

        proxy_buffering off;

        proxy_http_version 1.1; 
        proxy_set_header Host $host; 
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_redirect off;
        
        # set timeout
        proxy_read_timeout 600s;
        proxy_send_timeout 600s;
        send_timeout       600s;
        
        set $upstream_app immich-server;
        set $upstream_port 3001;
        set $upstream_proto http;
        proxy_pass $upstream_proto://$upstream_app:$upstream_port;
    }

}

I will check the nginx proxy configs to see if I can find out why it's broken. Thanks.

Aug 06 '24 05:08 Snuupy

@nomandera it is correct that you are misinterpreting this. Uninterrupted upload doesn't send the complete event to the server so the file will be reupload again

@alextran1502 excellent and thank you for the fast clarification. I am really glad I asked as it wasn't obvious to me reading all the discussion posts and from my symptoms that this would be the case and I suspect I wasn't alone in this interpretation. I will delete my 500GB of noise now with the confidence of knowing it is just leftovers and non unique.

Final question to close this off in my mind.

Does it follow then that if after cleanup of the leftover files, if they do not reoccur, we can say categorically that the client has subsequently successfully backed these up. Essentially I am just looking for a way to have confidence all my family clients have backed up i.e if I see new client images and no noise I can conclude they are fully backed up.

Aug 06 '24 07:08 nomandera

Reverse proxy issues aside, I still have the bug mentioned in this issue where backup data is corrupted while everything is showing green. I can even still reliably reproduce it.

Aug 06 '24 07:08 SixFive7

This seems to directly contradict my assumptions so now I am even more confused. The volume of media is such that I cant manually spot check this is any meaningful way and I am worried.

Update:

I believe I located my prime offender that being my youngest kids phone seems so struggle to upload videos (its not the best phone). This was pretty impactful for me because I ended up with 700GB of corrupt videos stuck which was enough to fill a drive and cause havoc with docker dropping the whole server. If the solution to not accumulating files is complex to solve or far off can I suggest at the very least some sort of space check is added.

Update 2: This issue has not reoccurred for me since I "fixed" the childs phone.

Aug 07 '24 08:08 nomandera

Just checking in on this. Does anyone know what the current state of this issue is?

I was very much looking forward to deploying Immich, but any question of data corruption (and even worse, undetected data corruption) on a photo backup solution is an absolute show-stopper. Echoing @nomandera's comments, I can't imagine a worse failure mode.

Does Immich do any kind of checksumming/file verification?

For reference, this is how Nextcloud does it: https://help.nextcloud.com/t/does-the-nextcloud-client-add-checksum-verifications-when-uploading/193040/2

Thank you in advance.

Sep 30 '24 17:09 Torqu3Wr3nch

The situation remains unchanged for me; the bug still occurs. It is easily testable as well. I simply sync everything with Immich and then compare the checksums of the folder on the server with the same folder synced by Syncthing. If I kill the app a few times, there are always some partially uploaded files.

I also wholeheartedly agree that silent data corruption is a worst-case scenario bug. It is the main, if not the only, blocker for me as well. That said, I respect the amount of work that goes into a project like Immich, so I will respectfully abide until a developer has time to fix this. When that happens, I can at least provide some thorough testing.

Sep 30 '24 23:09 SixFive7

Oh certainly; I too am appreciative of @alextran1502's efforts (in case it's not clear, thank you, Alex). I'm only disappointed because I'm looking forward to using the app but with this kind of a bug, I simply cannot use it yet.

I was actually hoping you would respond, @SixFive7. As I reread your initial post/responses, it seems like the errors/corrupted uploads are in fact detected, you do see all affected files in the untracked errors section of the repair tab, correct?

Apparently partial uploads are not detected by the upload code. Given that some partial uploads might be able to get processed as if they were a correct file this means there is no way to know if a part of the library is corrupt.

The partial uploads are detected though, aren't they? In the sense that you can find them in the repair tab, right? So you could use this to determine if part of the library is corrupt, correct?

The files that are partial enough (not all!) to get stuck in the untracked files sections are stuck there. There is some (older?) documentation referring to a "Remove Offline Files" job that does not seem to exist for the default user library?

So not all partial/corrupted files show up in the untracked section? This is the scenario I am most worried about.

Like I said, I haven't yet deployed Immich, I just don't want to end up in a situation where I think my data is protected but it is not.

Thank you again everyone for your responses in advance!

Oct 01 '24 02:10 Torqu3Wr3nch

We now cleanup these files

Jan 10 '25 18:01 mmomjian

We now cleanup these files

@mmomjian Do you have a reference to the relevant code, documentation or release note? I can't find anything and I still have untracked files. All are from partial uploads. The most recent app + server version does seem to re-upload everything and my storage template moves all the good files out. But the partial files are left behind and not cleaned up. Any clue would be helpful!

Apr 16 '25 23:04 SixFive7

@SixFive7 is this an older instance or new one? The left over files before the implement aren't cleaned up automatically

Apr 17 '25 00:04 alextran1502

@alextran1502 This is a brand new setup. Server and client installed today. Thought I give Immich another go after this issue got closed. Uploaded 1.809 files incl. some video files whilst switching from mobile to wifi to airplane mode and resuming. Only 1 orphaned file. So it is an improvement. Also, the single untracked file seems to be a partial video upload of a file that is also sitting with a complete copy in my album. So the system seems to have retried the upload and succeeded. Only question remaining is if the failed upload will clean itself up some time?

Apr 17 '25 01:04 SixFive7

@alextran1502 It has been almost half a year and the file has not been cleaned up yey. Immich has always been up to date and is now running v1.140.1 but the orphaned file is still there. Can we re-open the issue until the system auto cleans partial uploads?

Sep 09 '25 00:09 SixFive7

immich immich copied to clipboard

[BUG] Interrupted mobile uploads leave corrupt files

The bug

The OS that Immich Server is running on

Version of Immich Server

Version of Immich Mobile App

Platform with the issue

Your docker-compose.yml content

Your .env content

Reproduction steps

Relevant log output

Additional information

immich
immich copied to clipboard