Parla.py
                                
                                
                                
                                    Parla.py copied to clipboard
                            
                            
                            
                        Global Variables Aren't Properly Captured By Spawn
The spawn decorator currently messes with the closure of the function it's passed to change the scoping semantics. (See https://github.com/ut-parla/Parla.py/blob/d82e6d573a5cd5b1e492a99a0106cd90e632961e/parla/tasks.py#L416.) Though the current commenting calls that a "hack", but it is actually an integral part of how Parla functions and uses only supported APIs. We need to double down on it for semantic consistency. The current handling only does this hack for nonlocal variables. We need to do it for globals too. Closures should acquire objects and not allow name lookups to be redirected through an outer scope.
Example demonstrating the issue:
from parla import Parla
from parla.tasks import *
with Parla():
    for i in range(3):
        @spawn()
        def foo():
            print(i)
should print
0
1
2
but instead prints
2
2
2
                                    
                                    
                                    
                                
Naively, it seems that it would be enough to just copy the globals dict attached to the function. And that would fix the code that @sestephens73 mentions above. However, it would break recursive and mutually recursive functions containing tasks since in python such functions rely on the fact that the globals dict is expanded to contain the recursive reference AFTER the original function is created. So there may be a need to capture the current globals in one dict and then expand it with any additional values added later without updating previously assigned values. The problem with this is that it would create odd semantics in cases where the user code makes decisions about which function to use (for instance) for something AFTER declaring a function with tasks.
Okay, this seems like it might have performance penalties, but as a strawman suggestion... what if we capture non-function global objects into __closure__ as well?
what if we capture non-function global objects into
__closure__as well?
- Things other than functions can be recursive.
 - This would force us to rewrite the function byte code since global read and closure read are different instructions.
 
Yah, I'm not sure what to do about modifying bytecode. It'd be nice if we could avoid that. Thanks for pointing out that the opcodes are different. I'm also still really worried about performance penalties from having to iterate over __globals__ at spawn time. That strikes me as bad.
Okay, what if we go through the global names that are used in the function's bytecode, check if they're currently defined in __globals__, then modify the bytecode of the task to look up things that were defined at task creation time in __closure__ instead. I don't love this idea, but it seems like it might achieve the desired effect.
Worth noting: probably the most important question here is "what semantics do we actually want to support?" That'll determine the hoops we have to jump through to make things "work". We also don't have to solve this today.
It looks like recursive local functions are handled by creating a cell which is filled later with the declaration of the other function. This is very reasonable. But I hadn't really thought about it before.
https://replit.com/@arthurp3/RecursiveClosures#main.py
This means that closure "detach" logic will already break recursive local functions where the recursion is inside a task. I don't think I ever tried this:
def outer():
    def f():
        @spawn
        def task():
            f()
    f()
I don't have time to test and play with this, but I think this will crash (and not just make infinite tasks like it should). Interestingly, if f were a top-level function this would work (currently) exactly because of the global variable capture bug. Which I think is why quicksort was able to run.
So we need a principled way to handle recursive values in both locals and globals. Currently they are wrong in opposite ways.
It's not at all clear what the correct semantics is. I see a few potential options:
- Type directed: late-bind functions, and early-bind all other values. This is hard to implement and confusing. Many things are callable in Python and some things that you don't think of as functions might be, and some things you might think of as functions are not.
 - Implicit based on declaration context: everything that is available at declaration time is early-bound, everything else is late bound. This is actually fairly simple to implement (I think) and I don't actually agree with @insertinterestingnamehere that this would have a larger performance cost. However, it would make the semantics of closure capture depend on declaration order of functions and imports. For example, importing after a function is declared is potentially different from importing the same module before the function.
 - Explicit annotation of late-bound values: The only time early-binding is a problem for recursive functions or those declared out of dependency order in the global scope. These are not super common cases. So having the user specify values that should be late-bound may be acceptable. By default all values are early bound, even globals.
 
I would vote for 3, baring a common use case for lots of recursive functions or out of dep order declarations.
Yah, I think options 2 or 3 are both fine. I was originally thinking we'd want something like 2, but 3 seems perfectly fine. We should probably have an override mechanism for capturing names anyway.
Option 2 is a bit "magical". For instance,
for i in range(3):
    @spawn
    def t():
        slow_op()
        print(x)
    x = i
would work, most of the time, and would print 2 1 2, most of the time. The first task would late-bind x, and the later tasks would early-bind it.
for i in range(3):
    x = i
    @spawn
    def t():
        slow_op()
        print(x)
would work consistently, and would print 0 1 2.
To me this variation in semantics between very similar programs is not acceptable for a language semantics. I think the former case should be an error, since there really isn't any way to give it semantics that make sense.
Yah, that's an excellent point. Looking at the code for the first one, I can totally see what's wrong, but that doesn't mean it wouldn't lead to obscure bugs in a larger codebase. OTOH, the bad example doesn't resemble working serial code. Having a debug mode where each spawn call blocks until the task completes would catch this kind of thing.
I'm still somewhat ambivalent. The main point I see in favor of option 2 is that it still maintains Python's independence of definition order for functions within a module.
I'm still concerned about performance as well. Copying __globals__ seems like a fast track to having performance look good in benchmarks and then degrade when we try and use it in large projects. Its a very small dictionary in our small examples, but what about codebases with thousands of lines and hundreds of variables in a single file? What about when people use star imports? With this we'd essentially be letting our task launch overhead depend on the number of variables in the source file it comes from.
While rewriting the bytecode is a pain. Reading it to find out what globals the function actually uses and only copying those into the new globals dict for the function seems feasible.