hashdeep icon indicating copy to clipboard operation
hashdeep copied to clipboard

New Feature for hashdeep: Audit using two known_hashes.txt

Open chrisly42 opened this issue 9 years ago • 7 comments

Instead of using one hashes-file and running hashdeep on a live filesystem, it would be cool to just give a second known_hashes.txt file instead of a filename (or directory name, when using -r).

Rationale:

  • Two hashdeep runs creating the hashes can be run parallel on two different filesystems (aka two different physical drives). Usually, hashdeep is I/O bound, not CPU bound.
  • Also, if the audit fails, there is no need to run hashdeep again and wait for another day to compare 200 GB of data when you only forgot to specify -vv instead of -v to find out which files were bad.

chrisly42 avatar Jun 15 '15 13:06 chrisly42

Looking to see if this was possible was exactly the reason I logged into github tonight. This would be especially useful when creating known hash files simultaneously on two separate and large drives, and then be able to audit those together. Also, as I have done tonight (after 7 hours of auditing) I could run it again with a different verbosity or negative/positive matching mode.

It would be very handy to have this feature

madivad avatar Feb 11 '16 12:02 madivad

It took two days to run for me so being able to do it in parallel and then use the two files later would be a huge boon.

dvicory avatar Mar 17 '16 21:03 dvicory

Can't you just run in two windows?


Sent from my phone.

On Mar 17, 2016, at 5:11 PM, Daniel Vicory [email protected] wrote:

It took two days to run for me so being able to do it in parallel would be a huge boon.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

simsong avatar Mar 18 '16 01:03 simsong

You can't run an audit with two known hashes files. You can only run it with known hashes and a directory/file. So opening another shell wouldn't help, you'd end up with two known hashes files and no way to use them to do an audit. :disappointed:

dvicory avatar Mar 18 '16 01:03 dvicory

This would be very handy. What I've done is import two hash files into Access and ran a "Not Matching" query against them.

rccipriani avatar May 03 '16 16:05 rccipriani

Just commenting to say, "Me, too." This would be super useful in the case of multiple systems that have copies of the same data. One could set up a cron job to periodically create audit files on all systems, then pick and choose which "snapshots" to compare without a lot of overhead.

meeotch avatar Nov 19 '16 03:11 meeotch

To fix this issue, I am using the bash command join. I have two files, the 1st file with the source path's MD5 and SHA256 the 2nd file with the destination path's MD5s and I want to compare the two outputting only the non-matching MD5's in the source list (you can also do the reverse, too)

join -t , -v 1 -1 2 -2 2 -o 1.4 <(sort -t , -k 2 /path/to/sourceHashdeepMD5SHA256.txt) <(sort -t , -k 2 /path/to/destinationHashdeepMD5.txt)

Here is what all this means: join -t , using a comma as the delimiter -v 1 non-matches from file1 (matches are given if you change the "v" to an "a") -1 2 in file1 column2 -2 2 in file2 column2 -o 1.4 using the option: display file1 field4 - the file's path in the default hashdeep list <(sort -t , file1 is sorted using a comma as a delimiter -k 2 on column2 /path/to/sourceHashdeepMD5SHA256.txt) path to file1 <(sort -t , file2 is sorted using a comma as a delimiter -k 2 on column2 /path/to/destinationHashdeepMD5.txt) path to file2

read "man join" for more info about how to tweak parameters or add more columns

But this is just a temporary fix - I would be grateful if a command were added to the main program to compare two lists and output either matches of a certain hash-type or output non matches of a certain hash-type (optionally in list1, list2, or both lists)

fileintegrity avatar Nov 10 '17 06:11 fileintegrity