scons icon indicating copy to clipboard operation
scons copied to clipboard

Catching AttributeError which may occur when executing multi-threaded

Open bwuerkner opened this issue 2 years ago • 10 comments

I don't have a testcase for this because I don't really understand why or how this is happening but I can 100% reproduce it locally with a rather complicated setup.

The setup consists amongst other things of custom Builder and Action implementations.

The error I observe only occurs when running with -j2 or higher, it never occurs single threaded. It also goes away if I just call SCons again after it failed and then proceeds as expected. Running a --clean provokes the error again on next invocation.

The error is:

AttributeError : module 'SCons.Tool.gcc' has no attribute 'generate'
Traceback (most recent call last):
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 1286, in get_CacheDir
    path = self._CacheDir_path
           ^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 2540, in __getattr__
    attr = getattr(self.__dict__['__subject'], name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SConsEnvironment' object has no attribute '_CacheDir_path'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Taskmaster/__init__.py", line 231, in execute
    if not t.retrieve_from_cache():
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Node/FS.py", line 3034, in retrieve_from_cache
    return self.get_build_env().get_CacheDir().retrieve(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 1288, in get_CacheDir
    path = SCons.Defaults.DefaultEnvironment()._CacheDir_path
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Defaults.py", line 85, in DefaultEnvironment
    _default_env = SCons.Environment.Environment(*args, **kw)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 1248, in __init__
    apply_tools(self, tools, toolpath)
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 117, in apply_tools
    _ = env.Tool(tool)
        ^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 2033, in Tool
    tool(self)
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Tool/__init__.py", line 265, in __call__
    self.generate(env, *args, **kw)
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Tool/default.py", line 40, in generate
    for t in SCons.Tool.tool_list(env['PLATFORM'], env):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Tool/__init__.py", line 769, in tool_list
    c_compiler = FindTool(c_compilers, env) or c_compilers[0]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Tool/__init__.py", line 671, in FindTool
    t = Tool(tool)
        ^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Tool/__init__.py", line 119, in __init__
    self.generate = module.generate
                    ^^^^^^^^^^^^^^^
AttributeError: module 'SCons.Tool.gcc' has no attribute 'generate'

The AttributeError module is not consistent either, I've seen at least: AttributeError: module 'SCons.Tool.default' has no attribute 'generate' AttributeError: module 'SCons.Tool.gcc' has no attribute 'generate'

Also I'm not using Tool.gcc at all in this Project

Contributor Checklist:

  • [ ] I have created a new test or updated the unit tests to cover the new/changed functionality.
  • [x] I have updated CHANGES.txt (and read the README.rst)
  • [x] I have updated the appropriate documentation

bwuerkner avatar Sep 29 '23 08:09 bwuerkner

That's kind of scary - it means there's a way for the taskmaster to try retrieving the build environment of a task that doesn't really have one. All the rest is a consequence, we shouldn't really be going down the route of constructing a default environment with the initialization that goes with it. I'm suspicious this means something has a NullEnvironment where it shouldn't.

mwichmann avatar Sep 29 '23 14:09 mwichmann

Hmmm... the comment in retrieve_from_cache says:

    def retrieve_from_cache(self) -> bool:                                     
        """Try to retrieve the node's content from a cache                     
                                                                               
        This method is called from multiple threads in a parallel build,       
        so only do thread safe stuff here. 
        ... """

Following the thread of execution down to the instantiation of DefaultEnvironment gets into a place that is far from "atomic", with all the tool initialization... in fact, DefaultEnvironment() itself may be racy as it checks a global flag before going off and doing expensive stuff and then rewrites itself - but a second entry before that finishes is likely to go somewhere we don't want.

mwichmann avatar Sep 29 '23 15:09 mwichmann

Can you describe or share your custom Builder/Action? Version of Python? Version of SCons?

bdbaddog avatar Sep 29 '23 15:09 bdbaddog

@bdbaddog

Python 3.11.4 Linux (Ubuntu Focal) SCons 4.5.2 (pip package) AND also tried with SCons on master branch

I'll try to summarize them and give some code snippets. Generally, what I'm doing, is create a lot of targets in a loop based on the input from a JSON file. This has been working for many months now actually and is still working with all existing JSON files that were previously working. What's not working is a new JSON file that doesn't seem to have any relevant diff to the working ones. All of this to say that: This is working in certain circumstances and not in others which makes it hard to narrow down.

Now for the setup:

custom_tool.py

def exists(env):
    return True

def generate(env):
    some_action = Action(internal_function_to_execute_some_command, strfunction=internal_function_for_name)
    bld = Builder(action=some_action)
    env.Append(BUILDERS={"custom_tool": bld})

The internal_function_to_execute_some_command does all the heavy lifting, but even if I just immediately return 0 the error is triggered

Aside from that I have a SConstruct file that contains the loading logic, some error handling and parameter setting and the following:

SConstruct

env = Environment(
    variables = cmdline_params,
    tools=['custom_tool'],
    toolpath=['path/to/custom/tool'], 
    ENV=os.environ,
    overall_start_time = time.time(),
    db = None,
    event=None,
    execution_log={}
)

and

    for entry in data:
        target = env.custom_tool(
                source=entry.source,
                target=entry.dest
            )
        targets.append(target)

bwuerkner avatar Sep 29 '23 17:09 bwuerkner

Hmmm... the comment in retrieve_from_cache says:

    def retrieve_from_cache(self) -> bool:                                     
        """Try to retrieve the node's content from a cache                     
                                                                               
        This method is called from multiple threads in a parallel build,       
        so only do thread safe stuff here.  ..."""

Following the thread of execution down to the instantiation of DefaultEnvironment gets into a place that is far from "atomic", with all the tool initialization, is far from that... in fact, DefaultEnvironment() itself may be racy as it checks a global flag before going off and doing expensive stuff and then rewrites itself - but a second entry before that finishes is likely to go somewhere we don't want.

I did also find it weird that a function related to the cache is called when there's no cachedir defined. What you're describing sounds like it could be the root cause. As outlined in my other comment the race condition only appears with certain input and doesn't for other.

bwuerkner avatar Sep 29 '23 17:09 bwuerkner

just for grins, if you're not actually using the Default Environment (that is, your calls are all of the form env.Mybuilder, env.Append, etc.), could you add:

DefaultEnvironment(tools=[])

This isn't a fix, it's just an experiment which might help because of two factors:

  1. forces the initialization of DefaultEnvironment to happen early
  2. makes it much quicker

There still something quite interesting going on that it's not yet clear how to drill down on.

mwichmann avatar Sep 29 '23 18:09 mwichmann

Can you expand on what internal_function_to_execute_some_command does? Even a sanitized code snippet could provide some insight.. Are you calling any builders in that action?

bdbaddog avatar Sep 29 '23 18:09 bdbaddog

DefaultEnvironment(tools=[])

That does fix the issue so I guess Experiment succeeded.

Can you expand on what internal_function_to_execute_some_command does? Even a sanitized code snippet could provide some insight.. Are you calling any builders in that action?

No I'm not calling any builders or other SCons related content in that function. It is rather simple and as outlined above the content doesn't actually seem to matter. Currently it looks like this:

def internal_function_to_execute_some_command(target, source, env):
    return 0

and with that I can trigger the issue without a problem. I'm somewhat assuming it has to do with the parallel initialization of the Builders after their initial creation.

I didn't clearly state it in my original message but before the error occurs I get

scons: done reading SConscript files.
scons: Building targets ...

I'm also starting to think that this PR is not gonna be necessary since the manual initialization prevents the problem in a way more stable way.

bwuerkner avatar Sep 30 '23 17:09 bwuerkner

DefaultEnvironment(tools=[])

That does fix the issue so I guess Experiment succeeded.

Can you expand on what internal_function_to_execute_some_command does? Even a sanitized code snippet could provide some insight.. Are you calling any builders in that action?

No I'm not calling any builders or other SCons related content in that function. It is rather simple and as outlined above the content doesn't actually seem to matter. Currently it looks like this:

def internal_function_to_execute_some_command(target, source, env):
    return 0

and with that I can trigger the issue without a problem. I'm somewhat assuming it has to do with the parallel initialization of the Builders after their initial creation.

I didn't clearly state it in my original message but before the error occurs I get

scons: done reading SConscript files.
scons: Building targets ...

I'm also starting to think that this PR is not gonna be necessary since the manual initialization prevents the problem in a way more stable way.

It's very odd we've never seen this failure mode before, if nothing special about your usage is triggering it. Does your builder define an emitter or a scanner?

bdbaddog avatar Sep 30 '23 23:09 bdbaddog

I'm also starting to think that this PR is not gonna be necessary since the manual initialization prevents the problem in a way more stable way.

It's very odd we've never seen this failure mode before, if nothing special about your usage is triggering it.

I agree, that's odd. I think the code path looks vulnerable, but if it actually is (rather than just in theory), why hasn't it been seen over the last 20 years? It may be time to take a deeper look at revamping the way the Default Environment is initialized. There's a form of deferred initialization now, but as we've talked about in other threads (and as mentioned in #1069 - apparently some work was once started on this) what we probably really want is deferring the tool setup in the default environment. The above suggestion defers tool setup by not doing it at all on that environment, but it seems rather a hack - why do you need to do something special if you know you're not going to call on the environment yourself (and then what if you change your mind?). That's probably food for a separate thread - maybe in a GitHub Discussion, but thought I'd mention it here.

mwichmann avatar Oct 01 '23 17:10 mwichmann

@bwuerkner I think we'll close this PR and leave the Issue #4426 as a place to ponder this rare issue further?

If you think that's the wrong path please comment further here and we can reopen.

bdbaddog avatar Apr 06 '25 00:04 bdbaddog