Dominator
Dominator copied to clipboard
cmd/subd: export a merkle hash of on-disk state
It would be good to have a metric that holds, for example, a float32 obtained by taking a 4-byte prefix of a SHA-512 of the merkle hash of the entire file system. Such values could be logged to time series databases, and used by monitoring systems for making sure that the managed hosts converge to the same bits. Subd already scans the file system and calculates hashes so deriving a merkle hash should not introduce much extra overhead.
I like the basic idea. Does it need to be a Merkle tree hash? That would require storing hashes in the directory inodes. A more straight-forward implementation might be to have a modified hasher which hashes each hash that is computed. Note that, either way, this would only expose a hash of all the file data. Inode metadata would not be captured.
Agreed on all counts. I just wanted to put the basic idea in your head. I think metadata ought to be part of the hash.
Metadata will be more complicated, but I agree that it's the kind of thing you'd want. Perhaps mtime data maybe not included?
I can see wanting to exclude mtime for computed files.
What about mtime for regular files?
For regular files, mtime is image-defined and enforced just like any other attribute. I would include it.
If mtimes for regular files are included in the hash, they will also be included for computed files, because as far as the sub is concerned, they are just regular files. It's only the Dominator that knows that they are computed files.
Ack. So I don't see a reason to exclude mtime from the hash. We plan to do horizontal checks (host vs. host) and vertical (host vs. image).
The mtime difference for computed files will make that difficult.
The computed files will all have equal mtime thanks to os.Chtimes
, no?
The mtime for computed files is taken from the current time when the Dominator sees that the computed file contents need to be changed. So, in practice, every sub is going to have a different mtime for a particular computed file. There is no horizontal consistency.
I can see two options: a) set the mtime anyway (I realize this could confuse tools such as rsync), b) present the hasher with zero mtime for computed files.
I understand option b) would require dominator to start revealing to subd that certain files are computed, a classification detail that is currently beautifully hidden.
Your suggestion of excluding mtime from hash computation sounds pragmatic: it would allow the feature to be implemented without expanding interfaces but would not preclude including mtime later if a clean design is found.
Yes, excluding mtime from the hash seems the best for now. I'm reluctant to complicate subd unless it's essential.
Alternatively, the metric could be emitted at the level of the dominator server where the distinction between regular and computed files can still be made.
Hm. Maybe we should take a step back at look at the problem you're trying to solve? Do you want to ensure that all machines converge to the required state and have alerting for machines which do not converge (after N attempts, say)? If that's what you're looking for, then the Dominator already knows this. It's currently presented in the dashboard and it could be exposed via metrics too.