hashit icon indicating copy to clipboard operation
hashit copied to clipboard

Add Audit Mode

Open boyter opened this issue 10 months ago • 4 comments

It is time to add this last piece to fully do everything that hashdeep does. Thoughts and ideas below.

  1. Should maintain 100% compatibility with hashdeep output. This allows for independent verification. Having 2 tools that can verify is great for the paranoid, and it serves as a implementation verifier as well.
  2. This will mean accepting the hashdeep output format as an input for verification.
  3. We want the verifier to scale, to levels that hashdeep cannot, or where there is a RAM limited environment, as loading a 20 GB verification file might not be possible in some situations.
  4. Having an output format for verification that is not dependant on hashdeep should be supported as well. I propose using SQLite for this. This would allow for other scripts and processes to connect and verify without needing to build their own custom parsers of the format.

Hashdeep can verify files like so,

$ hashit --format hashdeep processor > audit.txt && hashdeep -l -r -a -v -k audit.txt processor
hashdeep: Audit passed
          Files matched: 9
Files partially matched: 0
            Files moved: 0
        New files found: 0
  Known files not found: 0

Note that you have ensure that the output does not land in the thing being verified which affects the verification, hence doing it on the processor folder in the above.

Hashdeep is doing a few things here.

  1. Confirming that all the files in the audit exist.
  2. Confirming if any of the files have changed
  3. Confirming if any of the files have moved
  4. Notifying about any new files
  5. Reporting any missing files

In effect it works with what could be two options.

  1. Tell me if the files I have seen previously are still here, moved or modified.
  2. Tell me if this matches my previous audit exactly.

Both seem like they could be options to include, but having a by default hashdeep compatible layer would be a good idea.

I propose, having the following

  1. Like for like output as per what we see with hashdeep. Need to confirm all output types for it when doing this however.
  2. Have an option to do the "Tell me if the files I have seen previously are still here, moved or modified."
  3. Have an option to do the "Tell me if this matches my previous audit exactly."

boyter avatar Mar 15 '25 05:03 boyter

Having the ability to check against every hash done would be nice too. While 3 hashes, assuming one of them is SHA256 or stronger is infeasible to craft a collision attack against, being able to use more hashes, or even all supported hashes could be an option for the truely paranoid.

boyter avatar Mar 15 '25 05:03 boyter

Reading https://linhost.info/2010/05/using-hashdeep-to-ensure-data-integrity/ to determine how the options work.

https://www.geeksforgeeks.org/hashdeep-a-digital-forensics-tool-in-kali-linux/

boyter avatar Mar 15 '25 05:03 boyter

Seems hashdeep has issues with files that have unicode in the filename.

boyter avatar Mar 15 '25 06:03 boyter

Work started on hashdeep audit https://github.com/boyter/hashit/compare/audit

boyter avatar Mar 19 '25 08:03 boyter

It would be useful if the last audit time / file mtime columns is also added to the SQL schema. So that snapraid scrub or git annex fsck --incremental-schedule 90d --time-limit 8h equivalents could be implemented by hashit audit, and the integrity of a petabyte dataset could be kept by an incremental cron job.

Arnie97 avatar Sep 25 '25 02:09 Arnie97

Yeah seems like a decent thing to add.

boyter avatar Sep 25 '25 03:09 boyter

Right audit functionality has landed in the master branch after this PR was merged https://github.com/boyter/hashit/pull/22

Closing this down for now as the functionality is now there, however it will need to be expanded out on as we hit issues from people using it in ways I didn't expect.

boyter avatar Dec 01 '25 02:12 boyter

I should mention adding the SQLite version to scale is not in this, but something I will want to add later. For now hashdeep compatibility is in, meaning in theory you can migrate away from it if you so choose, or ideally have another tool to verify with.

boyter avatar Dec 01 '25 02:12 boyter