scons icon indicating copy to clipboard operation
scons copied to clipboard

CacheDir duplicates cache entry for identical target outputs

Open mcspr opened this issue 2 years ago • 7 comments

Describe the Feature

Originally discussed at https://github.com/platformio/platformio-core/issues/4574 PlatformIO 'environments' are separated as VariantDirs, and often include similar target outputs SDK and library files. Every time environment is built, cache is populated exclusively for that environment alone; it is not possible to share cache entries between environments, thus increasing total compilation time of otherwise exact same project files.

With the following test construct taken from the issue

CacheDir('build_cache')
VariantDir('build1', 'src', duplicate=False)
VariantDir('build2', 'src', duplicate=False)
env1 = Environment()
env1.Program('build1/hello.c')
env2 = Environment()
env2.Program('build2/hello.c')

Empty build_cache and clean build directory, build1 variant is compiled from scratch as expected. Immediately trying to build build2 generates files from scratch as well, but since it is identical to the build1 one would expect the output files would be taken from build1 cache.

Currently, this does not work like that because of hard-coded path included for the CacheDir entry key, in the FS Node code method https://github.com/SCons/scons/blob/c80cbb0846c5d9265f356e86c4e79f1d13e3e8a8/SCons/Node/FS.py#L3705-L3717

Main question is - what is the best approach to handle such case? Naive solution would be to simply allow to modify this path, as mentioned in the original issue https://github.com/SCons/scons/compare/master...mcspr:scons:pio4574v2 FS node now has set_cachedir_bsig_path() method, which can be used to substitute that path; either set to some common prefix + filename, or simply removed.

But, I am not sure how that would apply to the env.Program() case, since it does not allow target factory (or, for anything in-between like .o files). Or, whether modifying FS Node code is the best place for this change to happen.

Required information

  • Link to SCons Users thread discussing your issue. https://github.com/platformio/platformio-core/issues/4574
  • Version of SCons v4.5.2
  • Version of Python 3.11.2

mcspr avatar Apr 13 '23 13:04 mcspr

I encountered this behavior at one point as well, and considered whether it was worth fixing. There are, however, some critical wrinkles. For this to work the build environment must guarantee that no path information ends up in the object files. If you have, for instance, debug info that records something like .dwo paths or split dwarf info or any other information like that, then what would otherwise be equivalent object files will end up with different content. If you allow those files to be considered equivalent via the cache by stripping the variant directory from consideration then you can end up retrieving the "wrong" file. Is it a problem in practice? Maybe or maybe not. But think if this behavior is desired it needs to be opt-in.

acmorrow avatar Apr 13 '23 14:04 acmorrow

As I understood it (and I might not have) .. wouldn't preprocessor stuff like "Current file" or "Line number" be rendered before hashing and checking the cache?

If so, that would make two different sources, which yields two different outputs.

LordMike avatar Apr 14 '23 10:04 LordMike

As I understood it (and I might not have) .. wouldn't preprocessor stuff like "Current file" or "Line number" be rendered before hashing and checking the cache?

If so, that would make two different sources, which yields two different outputs.

If duplicate=False, then the sources and preprocessor stuff would all be the same..so no. But some tools embed the target path/name in the output files.

bdbaddog avatar Apr 15 '23 01:04 bdbaddog

Setting aside embedding, I wonder if fiddling the relevant for_signature could help - if a variantdir is active, don't include the variantdir part in the calculation. The interpretation of what to include is claimed to be fairly arbitrary:

        ...    The purpose         
        of this method is to generate a value to be used in signature          
        calculation for the command line used to build a target, and           
        we use this method instead of str() to avoid unnecessary               
        rebuilds.  This method does not need to return something that          
        would actually work in a command line; it can return any kind of       
        nonsense, so long as it does not change.                 

mwichmann avatar Aug 27 '23 17:08 mwichmann

Resurrecting this because I hit the same issue. I was hoping to use the cache for two different VariantDirs, but because the output path is included in the hash, this did not work. For me the issue is not only in relation to VariantDirs, as you would also have the problem when two different projects use the same build artifact, but in different places in the project tree. The cache would hold both entries as they are perceived different. I agree with @acmorrow though that in general this the sensible way to go. Better have a more restrictive cache, than one that gets stuff wrong. Maybe we could have an option to select between hashing the full path and hashing only the file name? Projects that deem the latter acceptable could improve their usage of the cache this way. On a side note: I was surprised to learn that the hash does not include the name of the builder producing that file.

diedricm avatar Jun 19 '25 14:06 diedricm

@diedricm - to some extent the builder name is irrelevant. It's the actions the builder provides to convert the sources to the targets, and those are in the hash..

bdbaddog avatar Jun 19 '25 23:06 bdbaddog

@diedricm - to some extent the builder name is irrelevant. It's the actions the builder provides to convert the sources to the targets, and those are in the hash.. I did not know that! When I tried to change the name of the action it did not result in changes in the build signature. I tested it now again and found that changes to the code of the Action change the signature. Adding a return though also had no influence.

diedricm avatar Jun 20 '25 22:06 diedricm