OpenHashTab icon indicating copy to clipboard operation
OpenHashTab copied to clipboard

Feature request: Duplicate search

Open Dhyfer1 opened this issue 3 years ago • 18 comments

Hi @namazso

Kudos for your great work with OpenHashTab 👏 I use HashTab whenever I need it and it is a good program, but it cannot find the hash of a folder with its contents as OpenHashTab does. It is amazing that OpenHashTab is inspired by HashTab and at the same time you have never used it.

As I said in the title, I still don't fully understand this program, but mostly I want to know how can I compare the hash of two or more different files located in two different paths or in the same folder?

In HashTab it is easy as it has a button called 'Compare a file' and clicking on it opens the Open dialog box where you select the file to compare and if the file has the same hash values it shows a check sign, otherwise it shows a negative sign, how yo can see in both images.

Untitled

So something as obvious as comparing two or more different files with OpenHashTab is something I have not understood, when I select two or more files in the same folder I can see their hash values, then double click on any hash to copy it and the copied hash appears in the 'Check against' field, and from then on how can I compare that copied hash with another one? The OpenHashTab image that appears in your repository shows it with some hashes framed in green and some framed in red, and one framed in brown, I don't know what those colors mean in OpenHashTab.

I hope for your help, it may be something obvious in OpenHashTab but I haven't understood it yet, so forgive this newbie.

Dhyfer1 avatar Feb 12 '21 20:02 Dhyfer1

Comparing files that way isn't currently a feature, mainly because when i initially developed this i haven't thought about hashing files after the initial batch was finished, and supporting that would need some refactoring. What you can do instead is hash File 1, copy its hash (doubleclick the hash) then hash File 2 (and the hash will be automatically pasted into the field):

https://user-images.githubusercontent.com/8676443/107834093-960e3400-6d95-11eb-8e7c-713d43abf58a.mp4

The OpenHashTab image that appears in your repository shows it with some hashes framed in green and some framed in red, and one framed in brown, I don't know what those colors mean in OpenHashTab.

this one is about sum files when you try hashing one such file, OpenHashTab will detect this, and hash and compare the files listed in the file instead. the color coding is rather simple: Green = hash matches Brown = hash matches but is insecure algorithm Red = none of the hashes match Text is red = unreadable (hash will be error message instead)

namazso avatar Feb 12 '21 23:02 namazso

Comparing files that way isn't currently a feature, mainly because when i initially developed this i haven't thought about hashing files after the initial batch was finished, and supporting that would need some refactoring. What you can do instead is hash File 1, copy its hash (doubleclick the hash) then hash File 2 (and the hash will be automatically pasted into the field):

1613173845.mp4

So OpenHashTab is more an app for displaying hashes than for comparison? Well, about the video, if I were to do the same with, say 10 files, I have to close and open the window 10 times! I think you should consider adding an option to compare files as HashTab, for me is a feature that I need to see in OpenHashTab, because for that function is that I use HashTab and if I want to see hashes of the contents of a folder I resort to OpenhashTab.

Seriously, I think OpenHashTab needs that function.

this one is about sum files when you try hashing one such file, OpenHashTab will detect this, and hash and compare the files listed in the file instead. the color coding is rather simple: Green = hash matches Brown = hash matches but is insecure algorithm Red = none of the hashes match Text is red = unreadable (hash will be error message instead)

Okay, I got that part

Dhyfer1 avatar Feb 13 '21 02:02 Dhyfer1

So OpenHashTab is more an app for displaying hashes than for comparison?

for file verification, hash comparison, etc.. never thought about doing file comparison since there are less expensive ways to compare two files that duplicate finders like DupeGuru do already (for example if two files sizes mismatch, they are guaranteed mismatching, regardless of content).

I think you should consider adding an option to compare files as HashTab

how does HashTab solve the part that you'd need to open it for all 10 files in your example?

namazso avatar Feb 13 '21 13:02 namazso

for file verification, hash comparison, etc…

@Dhyfer1, beware, it is for headaches caused by confusion and bugs as well.

sergeevabc avatar Feb 14 '21 13:02 sergeevabc

@sergeevabc your lack of knowledge of algorithms is not a bug. just like how SHA-256 and SHA3-256 is different, xxHash64 and xxHash3-64 is different too.

The title used to be a gear emoji, but was replaced. I'll probably change it to Settings soon enough, didn't want to bother with missing localizations at release.

winhttp error code seems to not be handled by windows' own formatting, but it stands for invalid ssl certificate. I think it's understandable i didn't test update checker while MitM attacking my connection to github.com

namazso avatar Feb 14 '21 14:02 namazso

how does HashTab solve the part that you'd need to open it for all 10 files in your example?

Maybe I didn't explain myself well or maybe you didn't understand me. As I said before, I use HashTab mostly for the comparison of two or more files, so if I need to compare a hash with 10 files I don't need to close the Properties window 10 times, I just click the Compare a file button to select file #2, if it matches or not I click the Compare a file button again to compare file #3 and so on, all within the Properties window without having to leave the window. I know, maybe it is a long process, but HashTab helps me with its function to compare the hash of different files.

Your software is very good, but even some freeware and shareware programs for hash file calculation also have an option to compare hashes of several files. Now here is another example I want to practice with OpenHashTab, let's say I have a folder with 100 files, some of those files are identical copies of other files in the same folder, others are different copies, you say that OpenHashTab is also for hash comparison, so how can I compare the hash of a file (e.g. the SHA-256 hash value) with the SHA-256 values of the other 99 files and have it tell me which ones match and which ones don't? or should I follow the video instructions 100 times for each file?

Dhyfer1 avatar Feb 14 '21 17:02 Dhyfer1

I see. I never considered duplicate file finding as a use case since there are much faster approaches to that. Do you have any idea for UI to accomplish this? I can't really think of one (well, besides maybe a "find duplicates" button but that's pretty much a last resort)

namazso avatar Feb 15 '21 14:02 namazso

Hmmm. You see what I'm getting at?

Well, in HashTab when I select the hash value of a file (MD5 value in the image of my first comment for example) and click on the Compare a file button, the Open dialog window opens where I can select one file at a time; so I select the file to compare and click the Open button in the dialog window, then HashTab calculates the MD5 hash value of the selected file and tells me if it is the same hash or not, as you saw in the image.

I was also thinking about a button called Compare with file(s) or Compare and find duplicates, and that when I click the button I can select several files at once in the Open dialog, and after OpenHashTab opens all the selected files it tells me all the matching and non-matching files according to the hash I have specified. It would only be necessary to make a few changes in the interface, placing a button and modifying the list of files.

About modifying the list of files according to the selected hash, what I mean is that if I select the MD5 hash to compare for example, then the list that appears in the image of your repository would be the same, with the difference that in the Algorithm column would only appear the hash that I specify, in this case MD5 for all files in list, and like your image, that it shows in separate colors the files that have the same hash and those that do not match.

I hope I have been clear, and that you have understood me.

Dhyfer1 avatar Feb 16 '21 03:02 Dhyfer1

Well, in HashTab...

HashTab's sounds like bad UX for your use case. To find matches in 10 files you'd need to use 45 open file dialogs / comparisons. It's also pretty bad performance-wise.

when I click the button I can select several files at once in the Open dialog

why not just hash those initially?

as for the filtering approach, I'll see if I can just hide elements from the ListView, since deleting and readding them would be a bit complicated.

namazso avatar Feb 16 '21 13:02 namazso

HashTab's sounds like bad UX for your use case. To find matches in 10 files you'd need to use 45 open file dialogs / comparisons. It's also pretty bad performance-wise.

No. HashTab is a good product, but it has some disadvantages like all programs, and they are the following:

  • As I said before, HashTab can only select one file at a time in the Open dialog box for the comparison of hashes between different files.
  • HashTab cannot find the hash value of folders, it only works with files.
  • When the file is very large, an ISO for example, it takes a long time to calculate its hashes.
  • And it appears that it has not received any further updates since 2016.

But still, with those drawbacks I use it from time to time, and I use it mainly for hashes comparison.

why not just hash those initially?

as for the filtering approach, I'll see if I can just hide elements from the ListView, since deleting and readding them would be a bit complicated.

That's what I'm getting at, I hope these images help. It is a non-professional editing I did with Gimp but it is understandable.

  1. The Check against field is replaced by a button with the same name, or another name for comparing hashes between several files. Algorithm and Hash columns are still in place. So the image shows the hash values of a file, and if I want to compare the MD5 hash with the MD5 hashes of other files I just double click on the hash I want to compare (MD5 in this case) to copy it to the clipboard, and then click on the Check against button. Sin nombre2

  2. After clicking the Check against button the Open dialog window opens, where I can select more than one file at a time, so I select several files and click on the Open button of the dialog box. I do not place image here because it is understandable.

  3. Once the files have been selected, the matching and non-matching files are classified by color according to the hash that was copied to the clipboard, along with the file name and the algorithm that was chosen for the comparison. Sin nombre

  4. If you want to go back to the first image to select another type of hash to compare, perhaps SHA-256, you could put a button next to the Check against button called Return. The Return button could be inactive (grayed out) in the first image and active in the second image.

You told me before: Do you have any idea for UI to accomplish this? well this is my idea, what do you say?

Dhyfer1 avatar Feb 16 '21 18:02 Dhyfer1

No. HashTab is a good product, but it has some disadvantages like all programs

I meant specifically in relation to your use case. As far as i understood, if you wanted to find dupes between files a, b, c, d you'd need to do the following:

  1. Open HashTab at a
  2. Compare against b
  3. Compare against c
  4. Compare against d
  5. Open HashTab at b
  6. Compare against c
  7. Compare against d
  8. Open HashTab at c
  9. Compare against d

This is 9 hashing operations, while the minimum required to determine duplicate clusters is 4. It is also quite some steps more than optimal.

well this is my idea, what do you say?

Well, adding files to hash on the fly is currently not supported (as in code without refactoring, not just me not exposing it through UI) so that'd be a bit difficult for now. Currently code assumptions are basically that a list of files come in, a list of hashes go out, and what you see is a results screen, rather something "interactive".

However from what I understood, it looks like the same could be accomplished by this instead:

  1. Select all files you want to compare, open OpenHashTab at them

image

  1. Copy the hash of the one you want to see identicals to
  2. Paste into the compare to field
  3. (NEW) The main list only shows matching ones, rather than all

image

(sorry about mspainted images)

This would essentially turn the compare field into a filter

Do you think this would be fine?

namazso avatar Feb 16 '21 19:02 namazso

I meant specifically in relation to your use case. As far as i understood, if you wanted to find dupes between files a, b, c, d you'd need to do the following:

  1. Open HashTab at a
  2. Compare against b coincides with a, it is discarded
  3. Compare against c does not match
  4. Compare against d coincides with a, it is discarded

This is 9 hashing operations, while the minimum required to determine duplicate clusters is 4. It is also quite some steps more than optimal.

I get it, but for me it would be much less than that. Suppose I want to compare the MD5 hash of 10 files with HashTab (a,b,c,d,e,f,g,h,i,j) and 6 of those files have the same hash (a,b,d,g,i,j), the rest are not, but I don't know this until I use HashTab.

  1. Open HashTab at a
  2. Compare against b, has the same hash as a, so it is discarded. Therefore we are left with c,d,e,f,g,h,i and j.
  3. Compare against c the file does not match
  4. Compare against d, has the same hash as a, so it is discarded. Therefore we are left with c,e,f,g,h,i and j.
  5. Compare against e the file does not match
  6. Compare against f the file does not match
  7. Compare against g, has the same hash as a, so it is discarded. Therefore we are left with c,e,f,h,i and j.
  8. Compare against h the file does not match
  9. Compare against i, has the same hash as a, so it is discarded. Therefore we are left with c,e,f,h and j.
  10. Compare against j, has the same hash as a, so it is discarded. Therefore we are left with c,e,f and h.

A rule in mathematics says that if A equals B and if B equals C, then A equals C, apply that to my example above. In your example you opened files a,b and c to compare, but I only need to open one file and compare it with the others as in my example. I edited your answer so that you could also see how it would look like in your example.

However from what I understood, it looks like the same could be accomplished by this instead:

  1. Select all files you want to compare, open OpenHashTab at them
  2. Copy the hash of the one you want to see identicals to
  3. Paste into the compare to field
  4. (NEW) The main list only shows matching ones, rather than all This would essentially turn the compare field into a filter

Do you think this would be fine?

Yes, I think so, I like that idea. A filter with options to show those that match and those that do not match, and some option to return to the hashes of the original file as I indicated in point 4 of my previous comment, in case I want to use a different hash for comparison.

Dhyfer1 avatar Feb 16 '21 21:02 Dhyfer1

  1. Copy the hash of the one you want to see identicals to

Maybe something simpler? Select the hash on the list, and it automatically highlights each matching hash. No need to copy and paste into the filter box.

piotr-kubiak avatar Feb 17 '21 00:02 piotr-kubiak

However from what I understood, it looks like the same could be accomplished by this instead:

  1. Select all files you want to compare, open OpenHashTab at them

image

  1. Copy the hash of the one you want to see identicals to
  2. Paste into the compare to field
  3. (NEW) The main list only shows matching ones, rather than all

image

(sorry about mspainted images)

This would essentially turn the compare field into a filter

Do you think this would be fine?

Hi. It has been 3 months since my last answer and I still haven't seen any progress on the function to compare the hash of two or more files. Why is there no progress on the implementation of this feature?

Dhyfer1 avatar May 25 '21 17:05 Dhyfer1

because i'm busy (i have a job and education) and this is unpaid hobby work. UI works are lower priority too since I plan to move over to the current UI design to a treelistview which has rather different looks and interactions, so most other changes would be wasted. Unfortunately that takes effort, mainly on the building side since mCtrl was mainly designed around standalone programs just using it dynamically linked, while i use it statically linked in a side-by-side configuration (required for shell extensions) and haven't quite figured out yet what's not initializing in the desired order.

That being said, I welcome pull requests, so if you'd want to implement something related to this, feel free to.

namazso avatar May 27 '21 15:05 namazso

ok, like most repositories hosted on github, yours also needs time to grow. I just hope that when you have enough time you can implement the compare files feature, as I told you before it is a very useful feature, so until that change to OpenHashtab I will keep using Hashtab. That's all.

Dhyfer1 avatar May 30 '21 19:05 Dhyfer1

For anyone else looking to do this, as was pointed out, @Dhyfer1 was going about it the wrong way, and dupeGuru or WinMerge, both excellent programs designed for this type of task, are just two examples of something that would work much more efficiently for this than how they're doing it with HashTab. I'm all for adding functionality, and I certainly wouldn't mind OHT building it in, as it's always nice to have another tool for approaching a task in a different way, but considering it's both outside the main scope of OHT and that there are other, established, good alternative for accomplishing this type of job, it makes sense to make it a low priority. Unfortunately, @Dhyfer1 was more concerned about trying to get OHT to do what they wanted they skipped right over the point of using a better tool for the job.

If this were to be implemented in the future, I'd say there are a few things that could be done. First, have it automatically (or when a button to do so is clicked) mark duplicates somehow. Second, allow sorting to put duplicates together. Third, allow a duplicate to be right-clicked and have an option to show only duplicates of that file and/or to color-code them. Of course, if/when tree view is implemented, that will change a lot and there may be different/better ways to manage this.

vertigo220 avatar Jul 09 '22 00:07 vertigo220

Also, this is a duplicate of #8 (compare against other files) and #10 (color to show duplicates).

vertigo220 avatar Jul 09 '22 00:07 vertigo220