django-filer
django-filer copied to clipboard
Proposal: Enable duplicate detection
Wanted to gauge interest in adding duplicate file detection.
If enabled (an opt-in setting e.g. settings.FILER_PREVENT_DUPLICATES), when a user uploads a file (whether drag and drop, CMS filer plugin, or regular upload in the admin) it would detect if that file already exists and instead of saving the file, would redirect the user to the existing copy of the file.
This would only catch duplicate files uploaded after this feature was enabled, so a bonus feature would be to expose functionality to 'find duplicates', no additional functionality. 'Merging' duplicates could be complicated depending on how the site uses files so should likely be a manual process.
How does this help? In my case we have many files which we want to show up in site search results. It's easy enough to do a distinct filter on the sha1 when indexing the content to prevent duplicates from appearing in search, but it's not predictable which file the user should edit the title/description to change the search result text. We'd like to prevent duplicate uploads altogether for this project. An upside is savings on storage space.
If there's interest in this, I'll move forward with a pull request. I would like to have this feature in a project I'm working on but wanted to get feedback on suitability for this project first. If the flow described above is not suitable, are there other acceptable ways to expose the duplicate detection functionality available on FileManager to end users?
Thanks!
Pre-1.0 we used to have list the duplicates (based on the sha-hash). But we removed it, because it was unclear to novice users what they could actually do with that information.
Generally I'd love to see such a functionality in django-filer. We'll have to explore possible pitfalls though. Because it also makes it more likely that someone replaces a file and causes the new file to be used in more locations than he expects.
Anyhow, if it can be turned off through configuration, I think a simple version that just avoids duplicates would be great. Then we can build on that and see what else is needed to make it feasible to be active by default.
- resolve duplicates could also be an option: #627
- I really like the idea of redirecting to the existing file!
+1
@jsma any news on this?
Since there is the option to upload directly into "unfiled files", I see many users just uploading and forgetting. I already implemented a silent "just use the existing file if the sha1 matches" version on a very old filer, works like a charm (about what @jsma proposes, if I understand correctly). We could implement this, and just disable it by default @stefanfoulis @yakky ?
SEO wise it's catastrophic to have many versions of the same file.
@stefanfoulis @benzkji @jsma
Hi i was wondering if there is any intention to create this feature in the near future? Hope to hear something
sorry, just not enough ressources, on my side...it doesnt look like a priority, sadly.
@benzkji Thank you. maybe a bit off topic but is there a way to write a custom file setup instead of the randomized and date variants that are currently in the filer?
@edwinoko check https://django-filer.readthedocs.io/en/latest/settings.html#filer-storages there is an example for what you want.
I've written some of these, it's very easy: https://github.com/rouxcode/django-filer-addons/blob/master/filer_addons/filer_utils/generate_folder_and_filename.py
@benzkji that is a cool collection of path generators!
Thanks @benzkji for the awesome collection. I did make my own that stores the uploaded image under the user's name. Now i just need to find out how to prevent duplicates from being added for each user. It seems to auto add a random string to the filename. If anyone can point me in the right direction that would be very helpful. Thanks so far!
done that as well, but it's more like a proof of concept, and not used under production (but in one project ,-)...more like a hack, using signals... https://github.com/rouxcode/django-filer-addons/blob/master/filer_addons/filer_signals/models.py#L55
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
related: #1285
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This will now be closed due to inactivity, but feel free to reopen it.