hashdeep icon indicating copy to clipboard operation
hashdeep copied to clipboard

add an XATTRs based mode

Open calestyo opened this issue 8 years ago • 5 comments

Hi.

One problem with having files of sums is that when moving the files around, one's basically lost when it comes to verifying.

A nice solution for this is the usage of user XATTRs, i.e. storing the hash sums (and whichever other meta-data may be required) into XATTRs and reading them from there (when it comes to verification/etc.). One can also easily store multiple algos, simply by using different XATTRs.

The downside to a hash sums list file is obviously that: a) if files get accidentally/maliciously lost/removed, then no one might notice, as there is no central list of all files that should be there. b) the tree structure, names, etc. of the files themselves may be of (security) value to some people.

Either one could, depending on the usage scenario, just live with these downsides (or maybe security against such attacks is already provided at another level).

Or one could make kind of a mixed mode, which does both, XATTRs and file lists. I haven't thought that fully through, but the XATTRs would probably store everything that's file content related (i.e. hash sums of the whole file, or of blocks of the file), while the file list would keep track over the tree hierarchy, file names, meta-data like dates, permissions, etc. ... and changes to these.

As files are moved around on the system, the XATTRs would stay up to date, but the file lists would of course get "out of date".

  • Creating a new file list should be rather fast, the files' contents wouldn't need to be re-read (their sums are already stored in the XATTRS), and everything else (paths, names, dates, permissions, file types) is just metadata.
  • The bigger difficulty is to actually be able to use "out of date" file lists to do audits against the live filesystem e.g. which files were removed/moved by an attacker, or just accidentally, and if just moved, which are their new locations? Which file have changed their file type (e.g. from regular file to symlink)? The most obvious approach would be to use the hash sum as a key to match between the actual filesystem and the (out of date) file list. Here comes however the big problem: Just because there are two identical hash sums, that doesn't mean that these are/were two identical files. Another problem may be hard linked files, and looking to more modern filesystems (CoW like btrfs) refcopied files.

One approach could be not to use the hash sums as a key, but introduce a new ID, each for one file, and probably another one for each filesystem (e.g. it's UUID). When hashdeep would create it's hashlists, it could store another xattr to each file, which is a sequential number giving the ID of that file in the filesystem. If files are copied/moved from another filesystem (with another filesystem ID and possibly conflicting sequential file ID) it would offer to "import" these to the current filesystem (which would of course mean that it searches for new/next free sequential file IDs for all these files, and sets the filesystem ID of the current fs on them).

As files are removed over time, gaps would turn up in the sequential file ID. This isn't a problem per se (the IDs should be large enough to hold any imaginable number of files) but there could be still a "clean" up mode that closes the gaps (while at the same time updates the file lists and XATTRs).

Sounds complicated? Yeah,.. it probably is =)

Cheers, Chris.

calestyo avatar Jul 29 '15 03:07 calestyo