Dominator icon indicating copy to clipboard operation
Dominator copied to clipboard

cmd/subd: export a merkle hash of on-disk state

Open masiulaniec opened this issue 6 years ago • 17 comments

It would be good to have a metric that holds, for example, a float32 obtained by taking a 4-byte prefix of a SHA-512 of the merkle hash of the entire file system. Such values could be logged to time series databases, and used by monitoring systems for making sure that the managed hosts converge to the same bits. Subd already scans the file system and calculates hashes so deriving a merkle hash should not introduce much extra overhead.

masiulaniec avatar Jan 24 '19 04:01 masiulaniec

I like the basic idea. Does it need to be a Merkle tree hash? That would require storing hashes in the directory inodes. A more straight-forward implementation might be to have a modified hasher which hashes each hash that is computed. Note that, either way, this would only expose a hash of all the file data. Inode metadata would not be captured.

rgooch avatar Jan 24 '19 07:01 rgooch

Agreed on all counts. I just wanted to put the basic idea in your head. I think metadata ought to be part of the hash.

masiulaniec avatar Jan 24 '19 19:01 masiulaniec

Metadata will be more complicated, but I agree that it's the kind of thing you'd want. Perhaps mtime data maybe not included?

rgooch avatar Jan 24 '19 22:01 rgooch

I can see wanting to exclude mtime for computed files.

masiulaniec avatar Jan 24 '19 23:01 masiulaniec

What about mtime for regular files?

rgooch avatar Jan 25 '19 05:01 rgooch

For regular files, mtime is image-defined and enforced just like any other attribute. I would include it.

masiulaniec avatar Jan 25 '19 16:01 masiulaniec

If mtimes for regular files are included in the hash, they will also be included for computed files, because as far as the sub is concerned, they are just regular files. It's only the Dominator that knows that they are computed files.

rgooch avatar Jan 25 '19 16:01 rgooch

Ack. So I don't see a reason to exclude mtime from the hash. We plan to do horizontal checks (host vs. host) and vertical (host vs. image).

masiulaniec avatar Jan 26 '19 03:01 masiulaniec

The mtime difference for computed files will make that difficult.

rgooch avatar Jan 28 '19 04:01 rgooch

The computed files will all have equal mtime thanks to os.Chtimes, no?

masiulaniec avatar Jan 29 '19 23:01 masiulaniec

The mtime for computed files is taken from the current time when the Dominator sees that the computed file contents need to be changed. So, in practice, every sub is going to have a different mtime for a particular computed file. There is no horizontal consistency.

rgooch avatar Jan 30 '19 01:01 rgooch

I can see two options: a) set the mtime anyway (I realize this could confuse tools such as rsync), b) present the hasher with zero mtime for computed files.

masiulaniec avatar Jan 30 '19 14:01 masiulaniec

I understand option b) would require dominator to start revealing to subd that certain files are computed, a classification detail that is currently beautifully hidden.

masiulaniec avatar Jan 30 '19 14:01 masiulaniec

Your suggestion of excluding mtime from hash computation sounds pragmatic: it would allow the feature to be implemented without expanding interfaces but would not preclude including mtime later if a clean design is found.

masiulaniec avatar Jan 30 '19 14:01 masiulaniec

Yes, excluding mtime from the hash seems the best for now. I'm reluctant to complicate subd unless it's essential.

rgooch avatar Jan 31 '19 16:01 rgooch

Alternatively, the metric could be emitted at the level of the dominator server where the distinction between regular and computed files can still be made.

masiulaniec avatar Apr 13 '19 14:04 masiulaniec

Hm. Maybe we should take a step back at look at the problem you're trying to solve? Do you want to ensure that all machines converge to the required state and have alerting for machines which do not converge (after N attempts, say)? If that's what you're looking for, then the Dominator already knows this. It's currently presented in the dashboard and it could be exposed via metrics too.

rgooch avatar Apr 13 '19 16:04 rgooch