slsa-verifier
slsa-verifier copied to clipboard
Support directory hashes
As part of the effort to bring SLSA to ML https://github.com/google/model-transparency, we need to be able to sign directories. This requires the definition of a new "hash", i.e. how to serialize a directory. We have a PoC for this in the repo linked above, and need to implement it in slsa-verifier
/cc @mihaimaruseac @ramonpetgrave64
@smeiklej
jsonnet-bundler has a small utility method to generate the hash of a directory which might be useful here as well: https://github.com/jsonnet-bundler/jsonnet-bundler/blob/master/pkg/packages.go#L351
this code is not safe from a cryptographic hash point of view, e.g. you can rename files to change their meaning. The hash we have in the model repo also handled parallel hashing using a tree. See comments in https://github.com/google/model-transparency/issues/49
An even greater problem with the hash is that it lacks delimiters between files. So the two following directories will produce the same hashes: F1: "hello" F2: "world"
will produce the same hash has: F1: "hell" F2: "oworld"
ok I did not realize that the directory hash should be also taking that into account.
Maybe tree hashes as calculated by git would be useful. Here is some test that I performed by creating a file with the same content but different filename in different directories and how the hash would be calculated by git.
If the filename is equal, the hash is the same, if the filename differs, also the hash differs.
tn@proteus:~/workspace/eclipse/EclipseFdn/tmp$ git ls-tree HEAD
040000 tree 1e6dbf97adb05c42dcb537cd717e368812dc23b5 test
040000 tree 844053933521d6c52f2f96e288dc9175a2e6aea0 test2
040000 tree 1e6dbf97adb05c42dcb537cd717e368812dc23b5 test3
tn@proteus:~/workspace/eclipse/EclipseFdn/tmp$ git ls-tree -r HEAD
100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238 test/test.txt
100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238 test2/test2.txt
100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238 test3/test.txt
This could work but forces existence of a .git
directory and ties to git
hashing algorithm.
Sorry for the misunderstanding, I did not intend to suggest to use git itself, but rather its mechanism to generate tree hashes.
Oh, fair point. Thanks for clarifications.
Just adding to the conversation:
merkle trees seem like they could be a good way to hash directories, and someone has tried this in go.
re: your comments, I think we could add an aptional CLI switch to slsa-verifier like --enforce-subject-name-and-path
, and then the if the slsa-github-generator doens't already, it could put the relative paths in the subject.name
.
Thank you! We're now also experimenting with a manifest file instead of a hash of everything, but probably this won't work for SLSA (https://github.com/google/model-transparency/issues/111). Let's continue experimenting
SLSA will replace the manifest format by a provenance format, the rest probably can remain the same