add an XATTRs based mode
Hi.
One problem with having files of sums is that when moving the files around, one's basically lost when it comes to verifying.
A nice solution for this is the usage of user XATTRs, i.e. storing the hash sums (and whichever other meta-data may be required) into XATTRs and reading them from there (when it comes to verification/etc.). One can also easily store multiple algos, simply by using different XATTRs.
The downside to a hash sums list file is obviously that: a) if files get accidentally/maliciously lost/removed, then no one might notice, as there is no central list of all files that should be there. b) the tree structure, names, etc. of the files themselves may be of (security) value to some people.
Either one could, depending on the usage scenario, just live with these downsides (or maybe security against such attacks is already provided at another level).
Or one could make kind of a mixed mode, which does both, XATTRs and file lists. I haven't thought that fully through, but the XATTRs would probably store everything that's file content related (i.e. hash sums of the whole file, or of blocks of the file), while the file list would keep track over the tree hierarchy, file names, meta-data like dates, permissions, etc. ... and changes to these.
As files are moved around on the system, the XATTRs would stay up to date, but the file lists would of course get "out of date".
- Creating a new file list should be rather fast, the files' contents wouldn't need to be re-read (their sums are already stored in the XATTRS), and everything else (paths, names, dates, permissions, file types) is just metadata.
- The bigger difficulty is to actually be able to use "out of date" file lists to do audits against the live filesystem e.g. which files were removed/moved by an attacker, or just accidentally, and if just moved, which are their new locations? Which file have changed their file type (e.g. from regular file to symlink)? The most obvious approach would be to use the hash sum as a key to match between the actual filesystem and the (out of date) file list. Here comes however the big problem: Just because there are two identical hash sums, that doesn't mean that these are/were two identical files. Another problem may be hard linked files, and looking to more modern filesystems (CoW like btrfs) refcopied files.
One approach could be not to use the hash sums as a key, but introduce a new ID, each for one file, and probably another one for each filesystem (e.g. it's UUID). When hashdeep would create it's hashlists, it could store another xattr to each file, which is a sequential number giving the ID of that file in the filesystem. If files are copied/moved from another filesystem (with another filesystem ID and possibly conflicting sequential file ID) it would offer to "import" these to the current filesystem (which would of course mean that it searches for new/next free sequential file IDs for all these files, and sets the filesystem ID of the current fs on them).
As files are removed over time, gaps would turn up in the sequential file ID. This isn't a problem per se (the IDs should be large enough to hold any imaginable number of files) but there could be still a "clean" up mode that closes the gaps (while at the same time updates the file lists and XATTRs).
Sounds complicated? Yeah,.. it probably is =)
Cheers, Chris.
I don't quite understand the problem which you're attempting to solve here. Can you give an example of what goes wrong and how it should be? How would using XATTRs help?
I think he wants to have the hash move with the file, so that he doesn't need to maintain an external list of file hashes.
The problem is, an attacker who compromises a system can then simply re-compute the hashes. So it's not clear what you would use this for.
On Jul 30, 2015, at 10:43 PM, Jesse Kornblum [email protected] wrote:
I don't quite understand the problem which you're attempting to solve here. Can you give an example of what goes wrong and how it should be? How would using XATTRs help?
— Reply to this email directly or view it on GitHub https://github.com/jessek/hashdeep/issues/341#issuecomment-126547174.
There are a couple of tools supporting storing hash in xattr, just look at :
- https://github.com/ColumPaget/Hashrat
- https://weakish.github.io/shattr/
- https://bitbucket.org/maugier/shatag
It would be so nice if all would be using some kind of schema. (To me ideally, storing pair: mtime when check was done , and hash, so one can recognize that mtime since last hash calculation has changed and invalid hash accordingly) (how nice if filesystems would offer such "ephereal" xattrs, that disappear/are marked as discarded, when file is modified)
What do you get by storing the hash in the xattr?
On Sep 4, 2017, at 10:23 AM, Grzegorz Wierzowiecki [email protected] wrote:
There are a couple of tools supporting storing hash in xattr, just look at :
https://github.com/ColumPaget/Hashrat https://github.com/ColumPaget/Hashrat https://weakish.github.io/shattr/ https://weakish.github.io/shattr/ https://bitbucket.org/maugier/shatag https://bitbucket.org/maugier/shatag It would be so nice if all would be using some kind of schema. (To me ideally, storing pair: mtime when check was done , and hash, so one can recognize that mtime since last hash calculation has changed and invalid hash accordingly)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jessek/hashdeep/issues/341#issuecomment-326975288, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhTrGDNC_38xDvgJQmQL5G1HKv9TQbmks5sfAf3gaJpZM4FhsIv.
- different apps can collaboratively update/use that information (so apps start integrating with each other via xattr as their common denominator)
- e.g. you may skip recomputing hash again for given file, if earlier e.g. hashrat or shattr computed
- metadata moves with files when I move them around (in similar manner like metadata in xattr about URL origin from where file was downloaded)