CacheDir duplicates cache entry for identical target outputs
Describe the Feature
Originally discussed at https://github.com/platformio/platformio-core/issues/4574 PlatformIO 'environments' are separated as VariantDirs, and often include similar target outputs SDK and library files. Every time environment is built, cache is populated exclusively for that environment alone; it is not possible to share cache entries between environments, thus increasing total compilation time of otherwise exact same project files.
With the following test construct taken from the issue
CacheDir('build_cache')
VariantDir('build1', 'src', duplicate=False)
VariantDir('build2', 'src', duplicate=False)
env1 = Environment()
env1.Program('build1/hello.c')
env2 = Environment()
env2.Program('build2/hello.c')
Empty build_cache and clean build directory, build1 variant is compiled from scratch as expected.
Immediately trying to build build2 generates files from scratch as well, but since it is identical to the build1 one would expect the output files would be taken from build1 cache.
Currently, this does not work like that because of hard-coded path included for the CacheDir entry key, in the FS Node code method https://github.com/SCons/scons/blob/c80cbb0846c5d9265f356e86c4e79f1d13e3e8a8/SCons/Node/FS.py#L3705-L3717
Main question is - what is the best approach to handle such case?
Naive solution would be to simply allow to modify this path, as mentioned in the original issue
https://github.com/SCons/scons/compare/master...mcspr:scons:pio4574v2
FS node now has set_cachedir_bsig_path() method, which can be used to substitute that path; either set to some common prefix + filename, or simply removed.
But, I am not sure how that would apply to the env.Program() case, since it does not allow target factory (or, for anything in-between like .o files). Or, whether modifying FS Node code is the best place for this change to happen.
Required information
- Link to SCons Users thread discussing your issue. https://github.com/platformio/platformio-core/issues/4574
- Version of SCons v4.5.2
- Version of Python 3.11.2
I encountered this behavior at one point as well, and considered whether it was worth fixing. There are, however, some critical wrinkles. For this to work the build environment must guarantee that no path information ends up in the object files. If you have, for instance, debug info that records something like .dwo paths or split dwarf info or any other information like that, then what would otherwise be equivalent object files will end up with different content. If you allow those files to be considered equivalent via the cache by stripping the variant directory from consideration then you can end up retrieving the "wrong" file. Is it a problem in practice? Maybe or maybe not. But think if this behavior is desired it needs to be opt-in.
As I understood it (and I might not have) .. wouldn't preprocessor stuff like "Current file" or "Line number" be rendered before hashing and checking the cache?
If so, that would make two different sources, which yields two different outputs.
As I understood it (and I might not have) .. wouldn't preprocessor stuff like "Current file" or "Line number" be rendered before hashing and checking the cache?
If so, that would make two different sources, which yields two different outputs.
If duplicate=False, then the sources and preprocessor stuff would all be the same..so no. But some tools embed the target path/name in the output files.
Setting aside embedding, I wonder if fiddling the relevant for_signature could help - if a variantdir is active, don't include the variantdir part in the calculation. The interpretation of what to include is claimed to be fairly arbitrary:
... The purpose
of this method is to generate a value to be used in signature
calculation for the command line used to build a target, and
we use this method instead of str() to avoid unnecessary
rebuilds. This method does not need to return something that
would actually work in a command line; it can return any kind of
nonsense, so long as it does not change.
Resurrecting this because I hit the same issue. I was hoping to use the cache for two different VariantDirs, but because the output path is included in the hash, this did not work. For me the issue is not only in relation to VariantDirs, as you would also have the problem when two different projects use the same build artifact, but in different places in the project tree. The cache would hold both entries as they are perceived different. I agree with @acmorrow though that in general this the sensible way to go. Better have a more restrictive cache, than one that gets stuff wrong. Maybe we could have an option to select between hashing the full path and hashing only the file name? Projects that deem the latter acceptable could improve their usage of the cache this way. On a side note: I was surprised to learn that the hash does not include the name of the builder producing that file.
@diedricm - to some extent the builder name is irrelevant. It's the actions the builder provides to convert the sources to the targets, and those are in the hash..
@diedricm - to some extent the builder name is irrelevant. It's the actions the builder provides to convert the sources to the targets, and those are in the hash.. I did not know that! When I tried to change the name of the action it did not result in changes in the build signature. I tested it now again and found that changes to the code of the Action change the signature. Adding a return though also had no influence.