monobuild
monobuild copied to clipboard
Dependency-aware cache key (to allow memoization of build steps)
Feature request: a monobuild
command that calculates a hash for a component (or list of components).
This would facilitate the ability to memoize build steps (e.g. when using an artefact cache).
Each hash would be calculated as a Merkle hash of itself and all of its dependencies' hashes.
Some thoughts on this, these are more my notes so that I don't forget the thinking:
- It's much easier to do this on git's Merkle tree (i.e. a committed revision in git) than on the working directory. (And it's trivial to commit a working directory, even temporarily).
- Calculating the cache key is equivalent to filtering the repo down to only the contents of the directories of the given component and its dependencies, nothing else contributes to its cache id.
That means that in order to construct a cache key of a given component (and its dependencies), we can:
- Find all (strong and weak) dependencies of the component - this will produce a list of paths, e.g.
These are path prefixes we want to keepapp1 libs/lib1 libs/lib3
- Recursively prune the full git tree to only contain files which match the prefix, i.e.
- Turn the list of paths into a tree structure (using the
/
file delimiter) - the "filter tree", e.g. for the above:app1 libs lib1 lib3
- Walk the filter tree and the git file tree at the same time, until we reach the leaves of the filter tree
- On the way up, format trees (using
mk-tree
or equivalent) that only contain the paths in the filter tree, discard everything else - At the top, we end up with a new tree file
- Turn the list of paths into a tree structure (using the
- The new root tree oid (sha) is our cache key.
When any of the leaves (i.e. components) change, their tree file will be different and therefore the resulting root tree will be different, resulting in a different cache id.
If components are nested in one another, it forms implied weak dependencies (changing a sub-component changes the tree of its parent component as well) so everything still works as expected.
The nice thing is this scales with the number of dependencies of a component, not the size of the repo, so it should be pretty fast even on huge monorepos, unless the dependency graph is massive (at which point you've got a bigger problem to worry about).
It's not particularly hard to implement this, but I wonder whether there are git plumbing commands that would let us just do it.
This looks very useful http://alx.github.io/gitbook/7_raw_git.html
Sounds like Nix to me :) https://nixos.org/nix/