organize [Feature Request] - Duplicate functionality to support a primary folder location, rather than be based on age

Is your feature request related to a problem? Please describe. Duplicates are a constant problem for me when people upload files, and I’ve tried to implement a process where there is always a primary folder where the one we use should be. (Not by age, but by storage location)

Describe the solution you'd like To have the ability when using duplicates to specify a folder (including the hierarchy below it to be the primary master storage location.

Describe alternatives you've considered Looked at standalone duplicate tools like fdupes and jdupes but they do not seem to have a facility to assign a location to be the master (and therefore protected from any duplicates being deleted)

Additional context Nothing more to add I don’t think, it would be a great feature if possible to do..

Oct 06 '24 19:10 nodecentral

If I understand this correctly you may look into the detect_original_by: "last_seen" option of the duplicate filter. last_seen marks the file that was found later as the duplicate. Example:

rules:
  - locations:
      - ~/Downloads
      - ~/Desktop
    subfolders: true
    filters:
      - duplicate:
          detect_original_by: last_seen
    actions:
      - echo: "Found dup: {duplicate}"

Now if you have a duplicate in your downloads and your desktop, the desktop file is assumed to be the orginal.

Nov 25 '24 13:11 tfeldmann

Hi @tfeldmann - thanks for responding,

Unfortunately that approach only works in specific scenarios, and make a key assumption of which came first is what you want to keep

What I was hoping for was something that’s location based, not time based, that way people can move files in to a location that they want to have it preserved, and have any other instance actioned - e.g. identified, or moved, or deleted or …etc.

Nov 27 '24 19:11 nodecentral

„Which came first“ is not time based but based on the order of the locations you provide. Organize walks through the locations in the given order, so files in the first location are always first_seen, so to speak

Nov 28 '24 13:11 tfeldmann

Am I missing something here? Can you post a imaginary config to see how this should be specified?

Nov 30 '24 19:11 tfeldmann

I think the following is meant:

---
rules:
  - locations:
      - ~/Desktop
      - ~/Downloads
    subfolders: true
    filters:
      - duplicate:
          detect_original_by: first_seen
    actions:
      - echo: "Found dup: {duplicate}"

Unfortunately, it doesn't quite solve the problem. If, for example, duplicates already exist in ~/Desktop, they would also be handled, although the aim of the rule is probably rather to handle only files in ~/Downloads.

I had hoped to be able to solve this simply using the regex filter, but it only gets the file name and not the path.

---
rules:
  - name: Remove all file, that we have in the archive
    locations:
      - ./
      - ~/.archive
    subfolders: true
    filter_mode: all
    filters:
      - duplicate
      - regex: '^\./.*$'
    actions:
      - delete

So it would be nice if you could enable filters like name or regex to get not only the filename example.foo, but also the path ./example.foo. Alternatively, filters like name_path and regex_path would of course also be an option, but I think the first variant would be nicer.

---
rules:
  - name: Remove all file, that we have in the archive
    locations:
      - ./
      - ~/.archive
    subfolders: true
    filter_mode: all
    filter_string: path
    filters:
      - duplicate
      - regex: '^\./.*$'
    actions:
      - delete

For example, you could use the filter_string option via name or path to determine whether filters such as name or regex are passed as the string example.foo or ./example.foo.

Dec 01 '24 12:12 YoSiJo

I was looking for it to ignore equal hashes but I haven't found that it has that function yet.

May 10 '25 02:05 weskerty