hashdeep
hashdeep copied to clipboard
New Feature for hashdeep: Audit using two known_hashes.txt
Instead of using one hashes-file and running hashdeep on a live filesystem, it would be cool to just give a second known_hashes.txt file instead of a filename (or directory name, when using -r).
Rationale:
- Two hashdeep runs creating the hashes can be run parallel on two different filesystems (aka two different physical drives). Usually, hashdeep is I/O bound, not CPU bound.
- Also, if the audit fails, there is no need to run hashdeep again and wait for another day to compare 200 GB of data when you only forgot to specify -vv instead of -v to find out which files were bad.
Looking to see if this was possible was exactly the reason I logged into github tonight. This would be especially useful when creating known hash files simultaneously on two separate and large drives, and then be able to audit those together. Also, as I have done tonight (after 7 hours of auditing) I could run it again with a different verbosity or negative/positive matching mode.
It would be very handy to have this feature
It took two days to run for me so being able to do it in parallel and then use the two files later would be a huge boon.
Can't you just run in two windows?
Sent from my phone.
On Mar 17, 2016, at 5:11 PM, Daniel Vicory [email protected] wrote:
It took two days to run for me so being able to do it in parallel would be a huge boon.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
You can't run an audit with two known hashes files. You can only run it with known hashes and a directory/file. So opening another shell wouldn't help, you'd end up with two known hashes files and no way to use them to do an audit. :disappointed:
This would be very handy. What I've done is import two hash files into Access and ran a "Not Matching" query against them.
Just commenting to say, "Me, too." This would be super useful in the case of multiple systems that have copies of the same data. One could set up a cron job to periodically create audit files on all systems, then pick and choose which "snapshots" to compare without a lot of overhead.
To fix this issue, I am using the bash command join. I have two files, the 1st file with the source path's MD5 and SHA256 the 2nd file with the destination path's MD5s and I want to compare the two outputting only the non-matching MD5's in the source list (you can also do the reverse, too)
join -t , -v 1 -1 2 -2 2 -o 1.4 <(sort -t , -k 2 /path/to/sourceHashdeepMD5SHA256.txt) <(sort -t , -k 2 /path/to/destinationHashdeepMD5.txt)
Here is what all this means: join -t , using a comma as the delimiter -v 1 non-matches from file1 (matches are given if you change the "v" to an "a") -1 2 in file1 column2 -2 2 in file2 column2 -o 1.4 using the option: display file1 field4 - the file's path in the default hashdeep list <(sort -t , file1 is sorted using a comma as a delimiter -k 2 on column2 /path/to/sourceHashdeepMD5SHA256.txt) path to file1 <(sort -t , file2 is sorted using a comma as a delimiter -k 2 on column2 /path/to/destinationHashdeepMD5.txt) path to file2
read "man join" for more info about how to tweak parameters or add more columns
But this is just a temporary fix - I would be grateful if a command were added to the main program to compare two lists and output either matches of a certain hash-type or output non matches of a certain hash-type (optionally in list1, list2, or both lists)