django-filer icon indicating copy to clipboard operation
django-filer copied to clipboard

Proposal: Enable duplicate detection

Open jsma opened this issue 9 years ago • 14 comments

Wanted to gauge interest in adding duplicate file detection.

If enabled (an opt-in setting e.g. settings.FILER_PREVENT_DUPLICATES), when a user uploads a file (whether drag and drop, CMS filer plugin, or regular upload in the admin) it would detect if that file already exists and instead of saving the file, would redirect the user to the existing copy of the file.

This would only catch duplicate files uploaded after this feature was enabled, so a bonus feature would be to expose functionality to 'find duplicates', no additional functionality. 'Merging' duplicates could be complicated depending on how the site uses files so should likely be a manual process.

How does this help? In my case we have many files which we want to show up in site search results. It's easy enough to do a distinct filter on the sha1 when indexing the content to prevent duplicates from appearing in search, but it's not predictable which file the user should edit the title/description to change the search result text. We'd like to prevent duplicate uploads altogether for this project. An upside is savings on storage space.

If there's interest in this, I'll move forward with a pull request. I would like to have this feature in a project I'm working on but wanted to get feedback on suitability for this project first. If the flow described above is not suitable, are there other acceptable ways to expose the duplicate detection functionality available on FileManager to end users?

Thanks!

jsma avatar Feb 22 '16 21:02 jsma

Pre-1.0 we used to have list the duplicates (based on the sha-hash). But we removed it, because it was unclear to novice users what they could actually do with that information.

Generally I'd love to see such a functionality in django-filer. We'll have to explore possible pitfalls though. Because it also makes it more likely that someone replaces a file and causes the new file to be used in more locations than he expects.

Anyhow, if it can be turned off through configuration, I think a simple version that just avoids duplicates would be great. Then we can build on that and see what else is needed to make it feasible to be active by default.

stefanfoulis avatar Feb 23 '16 13:02 stefanfoulis

  • resolve duplicates could also be an option: #627
  • I really like the idea of redirecting to the existing file!

benzkji avatar Feb 26 '16 10:02 benzkji

+1

jakob-o avatar Dec 06 '16 15:12 jakob-o

@jsma any news on this?

benzkji avatar May 08 '17 12:05 benzkji

Since there is the option to upload directly into "unfiled files", I see many users just uploading and forgetting. I already implemented a silent "just use the existing file if the sha1 matches" version on a very old filer, works like a charm (about what @jsma proposes, if I understand correctly). We could implement this, and just disable it by default @stefanfoulis @yakky ?

SEO wise it's catastrophic to have many versions of the same file.

benzkji avatar May 09 '17 08:05 benzkji

@stefanfoulis @benzkji @jsma

Hi i was wondering if there is any intention to create this feature in the near future? Hope to hear something

edwinoko avatar Jul 16 '18 10:07 edwinoko

sorry, just not enough ressources, on my side...it doesnt look like a priority, sadly.

benzkji avatar Jul 16 '18 18:07 benzkji

@benzkji Thank you. maybe a bit off topic but is there a way to write a custom file setup instead of the randomized and date variants that are currently in the filer?

edwinoko avatar Jul 17 '18 09:07 edwinoko

@edwinoko check https://django-filer.readthedocs.io/en/latest/settings.html#filer-storages there is an example for what you want.

I've written some of these, it's very easy: https://github.com/rouxcode/django-filer-addons/blob/master/filer_addons/filer_utils/generate_folder_and_filename.py

benzkji avatar Jul 17 '18 15:07 benzkji

@benzkji that is a cool collection of path generators!

stefanfoulis avatar Jul 18 '18 07:07 stefanfoulis

Thanks @benzkji for the awesome collection. I did make my own that stores the uploaded image under the user's name. Now i just need to find out how to prevent duplicates from being added for each user. It seems to auto add a random string to the filename. If anyone can point me in the right direction that would be very helpful. Thanks so far!

edwinoko avatar Jul 18 '18 08:07 edwinoko

done that as well, but it's more like a proof of concept, and not used under production (but in one project ,-)...more like a hack, using signals... https://github.com/rouxcode/django-filer-addons/blob/master/filer_addons/filer_signals/models.py#L55

benzkji avatar Jul 18 '18 20:07 benzkji

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 28 '22 21:07 stale[bot]

related: #1285

benzkji avatar Aug 01 '22 10:08 benzkji

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 31 '22 00:10 stale[bot]

This will now be closed due to inactivity, but feel free to reopen it.

stale[bot] avatar Nov 28 '22 19:11 stale[bot]