immich icon indicating copy to clipboard operation
immich copied to clipboard

Move to client side hashing

Open jrasm91 opened this issue 2 years ago • 14 comments

Feature detail

Before upload, compute a client side hash in the mobile app and use that (eventually in combination with #731) to determine if an asset should be uploaded.

Platform

Mobile App

jrasm91 avatar Feb 05 '23 02:02 jrasm91

As an alternative to a simple SHA hash of the file, I suggest using an algorithm which allows to detect for near duplicate photos: photos that are visually identical but would result in different hashes as a result of compression, resizing, or filetype conversion. I've used this library written in Go to create a duplicate image finder which could pick up duplicates between originals on my iPhone and those which had been through Google Photos' compression algo. Photoprism is also using this library to detect duplicates.

However, I'm not totally sure how something like this would be used to prevent the client from uploading an existing duplicate to the server as it doesn't generate a unique artifact like hashing does. But I figured it was worth mentioning while this is being considered.

mike-lloyd03 avatar Feb 05 '23 04:02 mike-lloyd03

@mike-lloyd03 if we implement similarity detection that will be server side only. The current implementation is hash only. You can track #644 if interested in the fuzzy deduplication.

bo0tzz avatar Feb 05 '23 09:02 bo0tzz

As mentioned by @bo0tzz, I believe we should scope this issue to only duplicate detection rather than similar photo detection as it is a more foundational functionality of backup and sync.

Scenario The main scenario is when the phone is reinitialized, the Immich app loses its sync state and recognizes all photos as new photos.

  • This incurs huge and superfluous data transfer in the deduplication process as it has to upload all assets to the remote server for hash computation.
  • On the server side, a large amount of unnecessary CPU cycles/memory are also expended in handling the download of the entire duplicated photo library.

Why this deserves priority

  • Reinitialization of a mobile device is not a frequent event, but being able to synchronize state effectively is arguably is the most important part of a backup/sync application.
  • The upload of the library is without a doubt, much more battery intensive than computing the hash of all images locally on the device.
  • Since Immich is self-hosted, we're not just expending battery on the mobile device, but also server compute power, so the impact is more significant to users than a hosted service.

Own context

  • I have 56GB of photos on my mobile and recently reinitialized the phone, and now the Immich app has to upload all 56GB of photos for its state to be in sync with the Immich server

ikaruswill avatar Mar 09 '23 08:03 ikaruswill

+1 for client side hashing + deduplicating similar (not identical) photos. Similar looking photos could be collapsed into a single one (similar to how a burst photo is shown) on iOS

nijhawank avatar Mar 12 '23 04:03 nijhawank

Is there any update on this FR? This is a blocking point ofr my iOS device users, as they have 50k+ photos in their iclouds, and immich is trying to upload everything every time it's installed. This is after i manually imported all photos via CLI.

smnhdy avatar Oct 12 '23 06:10 smnhdy

Growing library of multiple family members with 10K photos at least combined. Hoping this comes out soon for iPhone users.

athornfam2 avatar Oct 13 '23 00:10 athornfam2

Noticed today on a fresh iOS install that the application properly detected photos that were already uploaded to the server, and the cloud checkmark appeared in the corner of the photos. That however didn't change the files to be sent to the server and the mobile application wanted to upload the full library to the server.

sgloutnikov avatar Oct 22 '23 22:10 sgloutnikov

Noticed today on a fresh iOS install that the application properly detected photos that were already uploaded to the server, and the cloud checkmark appeared in the corner of the photos. That however didn't change the files to be sent to the server and the mobile application wanted to upload the full library to the server.

Same thing on Android.

DX37 avatar Nov 18 '23 14:11 DX37

Strange... not the experience I get.

I reset my iPhone this week, and did a fresh install of 1.86 and it's now trying to upload all50k photos...

smnhdy avatar Nov 18 '23 15:11 smnhdy

Strange... not the experience I get.

I reset my iPhone this week, and did a fresh install of 1.86 and it's now trying to upload all50k photos...

Check for Duplicated Assets in Immich settings (local storage, I guess). The number of assets maybe growing while uploading...

DX37 avatar Nov 18 '23 16:11 DX37

I don't see any comments that this feature is on the roadmap at all.. is there any official word on this?

smnhdy avatar Dec 29 '23 02:12 smnhdy

This is definitely planned, we just don't have that many people working on the mobile app.

bo0tzz avatar Dec 29 '23 09:12 bo0tzz

I'd like to add that it would probably also be better performance-wise to switch to a more modern hash algorithm that can run in parallel, e.g. BLAKE3 or XXH3

p7996619 avatar Jan 16 '24 19:01 p7996619