dupeguru icon indicating copy to clipboard operation
dupeguru copied to clipboard

Files in dot-prefixed directories always Excluded. Not in Ignore List.

Open a-raccoon opened this issue 5 years ago • 17 comments

I'm attempting to scan thousands of directories, and in them hundreds of directories that are prefixed by a dot (period) such as .rsrc. dupeGuru insists on setting these directories to Excluded with no means for the user to disable this feature. This pattern does NOT appear in the Ignore List.

Why? These directories were created by 7zip when extracting hundreds of PE .exe resources into directory structures. The directory names are chosen by 7zip, not me.

a-raccoon avatar Dec 23 '18 08:12 a-raccoon

You can change the state of these folders by finding them in the main window and changing them to normal. It will include sub-directories of those directories in the search as long as they do not start with '.'. These folders are normally private/hidden files which the user has not created so I imagine that might have been the original reason for setting these to Excluded by default. It should be possible to create an option to include these directories as normal directories by default, or perhaps add the '.' pattern to the ignore path by default and let the user remove it instead of having this behavior hardcoded.

arsenetar avatar Jan 11 '19 00:01 arsenetar

I am going to mark this as a feature as it is possible (although a bit inconvenient to search these folders), and this is going to require making either an option or a change in how this is setup which at this point feels more like a feature to me.

arsenetar avatar Jan 11 '19 00:01 arsenetar

I just figured out a solution in the current version. Change all your added top-folders to Exclude, then change them back to Normal. It will go through and set all your dot-folders from Exclude to Normal as well.

You can kill this bug without fixing.

My dotted folders were coming from 7zip while extracting hundreds of .exe software installers. eg: the folder .rsrc contains all the PE resources. Anyway, I got it sorted out.

a-raccoon avatar Jan 11 '19 03:01 a-raccoon

@a-raccoon Thanks for your solution. It would still be nice to have the option to change this behavior.

Dobatymo avatar Jan 25 '19 01:01 Dobatymo

I just figured out a solution in the current version. Change all your added top-folders to Exclude, then change them back to Normal. It will go through and set all your dot-folders from Exclude to Normal as well.

This doesn't work for me. Wonder what I am doing differently. I'm using the Linux version. I, too, would like an option where I can search all files regardless of whether or not they are hidden or system files. Maybe a "techy" mode or something.

ghost avatar Jun 21 '19 20:06 ghost

Is there any way we can have an option to include dotfiles and dot directories handled as normal files?

thinuspollard avatar Sep 23 '21 11:09 thinuspollard

I suppose we could remove these lines and only rely on the Exclusion List (regular expressions).

glubsy avatar Oct 24 '21 10:10 glubsy

I think it does have a use case, in any case, people are expecting it to work this way.

Can we make it an option?

thinuspollard avatar Oct 24 '21 17:10 thinuspollard

I am fine with removing the lines @glubsy pointed out or only using them if for some reason there is not an exclude list. The exclude list by default adds those directories so the behavior by default would remain the same. There is some state caching that is being done that does not always refresh correctly without additional changes though. If a folder has been loaded before they will "stick" to the non-Normal state once it has been loaded that way once. So there is a bit more work that is needed to work out some oddities with that caching / reloading without cache when the exclude list is changed.

arsenetar avatar Nov 24 '21 05:11 arsenetar

@a-raccoon Thanks for your solution. It would still be nice to have the option to change this behavior.

@Dobatymo it wouldn't "just be nice" to have an option to include/ignore dot folders and files but IMHO essential, because you can work around this issue for folders (really cumbersome if you have more than a few in the base folder), but dot files in the base folder seem to be ignored in any case! :open_mouth: -> and this is IMHO a very bad practice, if not a bug: imagine a casual user who thinks all files were compared (because non were ignored neither in the list nor via options) and so he deletes the allegedly "empty" base folders manually. ...perhaps the hidden files were essential, have no duplicates and are now lost forever!

IMHO it doesn't have to be the default to include dot files/folders... although personally i would prefer it this way: i rather see "too much" files at first and exclude them, instead of not seeing them and wondering what happened afterwards.

DJCrashdummy avatar Mar 07 '22 12:03 DJCrashdummy

Does the "folder mode" consider dot files/folders? Or ignore them?

I have deleted some folders with "folder mode" and then, for safety, I extracted them from the trashbin and re-deleted with the "content mode". As result, every dot-file wasn't deleted, the second time. I don't know if the first time these folders was deleted (with contained dot-files) because the dot-files was checked and found identical, or because the scan ignore them...

mosiser avatar Jul 16 '22 14:07 mosiser

@mosiser folder mode should compare the contents of the folders completely, this means all files within those folders will be considered. Content mode will follow the exclusion filters and ignore lists when deciding which files to not scan. By default the exclusion list contains a rule to ignore all files starting with . you can disable that rule if you would like.

arsenetar avatar Jul 17 '22 00:07 arsenetar

folder mode should compare the contents of the folders completely, this means all files within those folders will be considered.

Thanks for clarification, I couldn't find this info anywhere...

Content mode will follow the exclusion filters and ignore lists when deciding which files to not scan. By default the exclusion list contains a rule to ignore all files starting with . you can disable that rule if you would like.

Sorry, I didn't know how the Regex system works, I saw the exclusion list, but I didn't realise the dot-file exclusion was one of the options listed... All right now. Thanks again.

mosiser avatar Jul 25 '22 15:07 mosiser

imagine a casual user who thinks all files were compared [...] and so he deletes the allegedly "empty" base folders manually

I did this. 😞 I did a Content scan under the assumption no files were being ignored. I didn't check the "Exclusion Filters" list (I had assumed it was empty by default).

It was only after I noticed some .htaccess files were not included in the dupeGuru scan results--and then manually confirmed the files did exist in both scanned folders--that I searched for and found this Issue and learned about this default behavior.

One UI change that I think would make this default behavior more apparent to users is to have the "Exclusion Filters" tab be open by default (not necessarily active, but open instead of closed) by default. I think that will make it more likely that new users check it.

image

gitname avatar Jul 31 '22 23:07 gitname

Here are the default "Exclusion Filters" entries, for reference:

image

The first one (^\..*) is the one that refers to dot-files.

Users may find it helpful to see a "Description" or "Comment" column (whose value is optional) in the table. For this particular entry, the "Description" value could be "Filename begins with dot".

gitname avatar Jul 31 '22 23:07 gitname

have the "Exclusion Filters" tab be open by default (not necessarily active, but open instead of closed) by default

That's worth considering indeed.

The "exclusion filters" is a fairly advanced feature which was essentially implemented to circumvent the fact that dot files were excluded by default (it was hard coded if I remember correctly).

glubsy avatar Aug 01 '22 17:08 glubsy

I tried all the solutions (except for editing directories.py) above, and dupeguru is still excluding all the 'hidden' directories.

This is a problem, for example, if you're searching directories that have log files in the hidden dirs (such as Pidgin keeping it's log files in ~/.purple/logs) If you're trying to clean up duplicates it won't work

jelabarre59 avatar Jan 03 '24 17:01 jelabarre59