[Feature Request] - Duplicate functionality to support a primary folder location, rather than be based on age
Is your feature request related to a problem? Please describe. Duplicates are a constant problem for me when people upload files, and I’ve tried to implement a process where there is always a primary folder where the one we use should be. (Not by age, but by storage location)
Describe the solution you'd like To have the ability when using duplicates to specify a folder (including the hierarchy below it to be the primary master storage location.
Describe alternatives you've considered Looked at standalone duplicate tools like fdupes and jdupes but they do not seem to have a facility to assign a location to be the master (and therefore protected from any duplicates being deleted)
Additional context Nothing more to add I don’t think, it would be a great feature if possible to do..
If I understand this correctly you may look into the detect_original_by: "last_seen" option of the duplicate filter. last_seen marks the file that was found later as the duplicate. Example:
rules:
- locations:
- ~/Downloads
- ~/Desktop
subfolders: true
filters:
- duplicate:
detect_original_by: last_seen
actions:
- echo: "Found dup: {duplicate}"
Now if you have a duplicate in your downloads and your desktop, the desktop file is assumed to be the orginal.
Hi @tfeldmann - thanks for responding,
Unfortunately that approach only works in specific scenarios, and make a key assumption of which came first is what you want to keep
What I was hoping for was something that’s location based, not time based, that way people can move files in to a location that they want to have it preserved, and have any other instance actioned - e.g. identified, or moved, or deleted or …etc.
„Which came first“ is not time based but based on the order of the locations you provide. Organize walks through the locations in the given order, so files in the first location are always first_seen, so to speak
Am I missing something here? Can you post a imaginary config to see how this should be specified?
I think the following is meant:
---
rules:
- locations:
- ~/Desktop
- ~/Downloads
subfolders: true
filters:
- duplicate:
detect_original_by: first_seen
actions:
- echo: "Found dup: {duplicate}"
Unfortunately, it doesn't quite solve the problem.
If, for example, duplicates already exist in ~/Desktop, they would also be handled, although the aim of the rule is probably rather to handle only files in ~/Downloads.
I had hoped to be able to solve this simply using the regex filter, but it only gets the file name and not the path.
---
rules:
- name: Remove all file, that we have in the archive
locations:
- ./
- ~/.archive
subfolders: true
filter_mode: all
filters:
- duplicate
- regex: '^\./.*$'
actions:
- delete
So it would be nice if you could enable filters like name or regex to get not only the filename example.foo, but also the path ./example.foo.
Alternatively, filters like name_path and regex_path would of course also be an option, but I think the first variant would be nicer.
---
rules:
- name: Remove all file, that we have in the archive
locations:
- ./
- ~/.archive
subfolders: true
filter_mode: all
filter_string: path
filters:
- duplicate
- regex: '^\./.*$'
actions:
- delete
For example, you could use the filter_string option via name or path to determine whether filters such as name or regex are passed as the string example.foo or ./example.foo.
I was looking for it to ignore equal hashes but I haven't found that it has that function yet.