ente icon indicating copy to clipboard operation
ente copied to clipboard

Remove duplicates on large number of files

Open im7mortal opened this issue 1 year ago • 8 comments

I am trying to sync my google zip file with ente. I tried it to 2 years back #ente-io/photos-web/issues/243 and had problems. So turn out that it every time created new file. Now I have 2 or 3 duplicates for at least 15 000 files. ENTE_SELECTION

  1. I just don't want to do it again. It's exhausting and interface doesn't help.
  2. To revise it manually it's huge pain. I was trying to do it and after 1 hour the app was down for memory exceed.
  3. I can't trust ente as you do comparison of duplicates by size. And I saw many duplicates with important to me fotos.

So how I can see the possible solution

  1. Maybe limit duplicates by chunks. For example 1000 duplicates. Then it possible to remove this chunk and start new one.
  2. Memory crash should be debugged
  3. Add sophisticated filter to the duplicate search. : name, size, I believe you should have a some kind of hash also?

Some more prehistory . I was trying to migrate to ente in 2021 but I couldn't because of the bugs. Then I had my phone connected to ente. So I just can't remove all files and start google zip import from the scratch.

im7mortal avatar Jan 17 '24 01:01 im7mortal

20240116_214541

Like here. The photo on the left some trash screnshot but it has the same size as an important photo

im7mortal avatar Jan 17 '24 01:01 im7mortal

The dedupe logic is the same across the web and desktop. Moving the issue to photos-web repo

abhinavkgrd avatar Jan 18 '24 02:01 abhinavkgrd

@im7mortal hey, we've updated the implementation to use file-hashes instead of file-sizes + creation-times, since the former was more reliable.

You might still be seeing the older flow, since we might not have computed the hashes for items that were uploaded in the past. If you clear your library and re-upload, the experience will be as you expect it to be.

If you run into a crash, please share logs (Help > View logs) with [email protected], we'd love to take a look!

vishnukvmd avatar Jan 18 '24 18:01 vishnukvmd

If you clear your library and re-upload, the experience will be as you expect it to be.

As I mentioned I also used ente.io on the phone some time and I have other photos which I can't loose.

Is there an api to update hash ? I could run long task on my laptop

im7mortal avatar Jan 20 '24 16:01 im7mortal

We don't have an API to update the hash at the moment :(

Could you try running the de-dupe on your mobile device? There we only process those items that have a hash available.

In the future we intend to provide a way to de-dupe "similar" images, that would solve for this use case, but we don't have a clear timeline for that at the moment.

vishnukvmd avatar Jan 23 '24 16:01 vishnukvmd

I tried it overnight in android app. It's showed me a spin, I waited some time then went to sleep. As I just mentioned in photos-app#1380, it wasnt clear if ente was doing something or not.

When I got up it showed that it found only 2 duplicates , which I loaded this week.

im7mortal avatar Feb 04 '24 13:02 im7mortal

Sounds like there are only 2 photos that are exact duplicates (with a matching hash)?

vishnukvmd avatar Feb 05 '24 04:02 vishnukvmd