buildcache
buildcache copied to clipboard
Flexible hash calculation
- Sometimes one wants a more strict compiler identification, say build id or md5sum of the compiler binary.
- On the other hand often the exact version of the compiler does not matter, for instance one might want consider
g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
andg++ (Ubuntu 7.4.0-1ubuntu2~18.04.1)
as completely equivalent. - One might want to include the linemarkers preprocessed file hash calculation (to get precise debugging info when compiling with
-O0 -g
and/or--coverage
). On the other hand it's reasonable to ignore them when compiling with optimization enabled (since line numbers in the debugging info can't be accurate anyway)
Surely one can implement such (and many other) things in Lua. However (as far as I understand) this requires writing a complete wrapper, even if one wants to reimplement get_program_id()
only.
get_program_id()
is one of the trickiest and least stringent parts of the whole hashing process, for sure. The version string is not a 1:1 ID, as you say (both 1:N and N:1 are possible). The executable binary is not unique either, especially considering shared library and data dependencies that could affect compilation results.
Having line info in the preprocessed file is problematic when using a shared cache (e.g. redis, or when having multiple local build folders) since absolute paths may differ and thus give cache misses
That said, having a more flexible/configurable hashing system would be useful.
I see three paths forward (not mutually exclusive):
- Incrementally build better heuristics in the individual wrappers - try to catch odd cases etc.
- Add user configuration options (wrapper specific and/or general) - e.g. for making certain parts of the hashing more lax (optimize for performance) or more strict (optimize for correctness).
- Expand the Lua system so that you can override individual parts of a predefined wrapper (e.g. add "override gcc" to the first line). I'm not sure how complex it would be, or how it would affect performance, but it's certainly doable.
@asheplyakov What are your feelings regarding this issue? Now that we have BUILDCACHE_ACCURACY
, at least parts of the issue have been resolved IMO.