Dagger.jl icon indicating copy to clipboard operation
Dagger.jl copied to clipboard

`@spawn` has confusing move semantics

Open ExpandingMan opened this issue 3 years ago • 8 comments

Note that

◖◗ Dagger.@spawn single=2 println(myid())
EagerThunk (running)

◖◗       From worker 2:        1
◖◗

◖◗ Dagger.spawn(() -> println(myid()), single=2)
EagerThunk (running)

◖◗       From worker 2:        2

I think what's happening here is that in the former case it is evaluating myid() on process 1 and the function executed on worker 2 is just println(1).

This seems like a problem, it's hard to guess that these examples would be different. I suggest that @spawn should match spawn so that its arguments can be read as if it were the function body passed to spawn. Otherwise I think it will need a lot of documentation.

A stronger argument might be that Distributed.@spawnat already works as I'm suggesting:

◖◗ @spawnat 2 println(myid());

◖◗       From worker 2:        2

(My suggestion is clearly badly breaking at least in principle, so this is tricky.)

ExpandingMan avatar Sep 09 '22 22:09 ExpandingMan

Right, this has always been somewhat annoying. It's basically the same issue that @atomic has, where only a single operation (in this case, a function call) is actually submitted through Dagger.

The equivalent to your Dagger.spawn example is Dagger.@spawn (()->println(myid()))(), which works as you'd expect (but is obviously pretty annoying to write). I'm wary of extending @spawn to (automatically) recursively process calls, because it can lead to poor performance if users submit expressions which contain many small function calls (because of task execution overhead). That said, I am open to adding an option to @spawn to make it process all function calls in an expression; in fact, the logic for this already exists within the expression-processing logic builtin to @spawn, it's just disabled: https://github.com/JuliaParallel/Dagger.jl/blob/bfe450290b6cea126fa6350dfd40657d7d450531/src/thunk.jl#L337 (see how recur=false is manually forced, irrespective of what recur value is passed to the parent function).

I'm also very open to enhancing the documentation to explicitly call this behavior out, showing how it can cause unexpected behavior, and showing how to work around it with inline anonymous functions.

jpsamaroo avatar Sep 11 '22 12:09 jpsamaroo

Dagger.@spawn (()->println(myid()))(), which works as you'd expect (but is obviously pretty annoying to write).

For what it's worth, I would never have guessed that. To me the bigger issue is not how annoying it is to write but that it's very hard to look at the argument to @spawn and know what it's going to do.

ExpandingMan avatar Sep 11 '22 15:09 ExpandingMan

For what it's worth, I would never have guessed that.

Right, most users would not, which I'm realizing is because Dagger.@spawn doesn't actually work the same way as Threads.@spawn. This is because Threads.@spawn always constructs an anonymous function/closure, whereas Dagger.@spawn really wants just a single function call (because we use function signatures for behavior tracking in the scheduler). If we always constructed anonymous functions, Dagger wouldn't be able to balance workloads by recognizing behavioral patterns.

Aside from documenting this difference, long-term I'd be willing to switch Dagger.@spawn to mimic Threads.@spawn, and have a different macro (or just a macro argument) for just wrapping single function calls. In the short-term, we can do 2 things:

  1. Add an option to enable recursive processing of expressions, as pointed out above, probably specified as Dagger.@spawn parse=:recur println(myid()), which would generate Dagger.@spawn println(Dagger.@spawn myid()). This can enable a lot of parallelism in a single @spawn call, but in this case, is not exactly what you want.
  2. If executed as Dagger.@spawn begin println(myid()) end, construct an anonymous function closure automatically, which would be functionally the same as Dagger.@spawn (()->println(myid()))(). We can also add an argument to the macro to enable this when we omit the begin ... end, such as Dagger.@spawn parse=:closure println(myid()).

Either option has their merits, so we should really implement both of them. If this sounds good to you and you feel up to the challenge, you're welcome to take on this PR; otherwise I'll probably get to it in the next few weeks.

jpsamaroo avatar Sep 14 '22 13:09 jpsamaroo

Thanks, no rush.

I think that unless Dagger.@spawn were renamed it should have the same behavior as Threads.@spawn (at least by default), otherwise it's just too confusing. Since renaming would be badly breaking (not to mention the difficulty of finding an appropriate name), that seems to leave us with having to change it.

ExpandingMan avatar Sep 14 '22 14:09 ExpandingMan

After thinking about it, I'm in agreement that we should switch the default to parse=:closure; the current behavior should probably become parse=:call, meaning "parse just one function call as a Dagger call". It's pretty unlikely that this will negatively impact users (and might even fix some accidental bugs downstream in the process).

jpsamaroo avatar Sep 15 '22 14:09 jpsamaroo

I was messing around a bit more today and another thing I notice which I find pretty confusing is that @spawn will only interpolate thunk results one call deep... for example

a = @spawn f(1)
b = @spawn f(2)
c = @spawn g(vcat(a, b))

will fail because vcat is trying to operate on thunks, not the result of fetch. It's not that it doesn't make sense, but it limits the usefulness of @spawn relative to just doing spawn.

ExpandingMan avatar Oct 04 '22 14:10 ExpandingMan

Right, I also noticed this while working on implementing the new parsing behavior. Thankfully, with the upcoming new closure-based behavior, Julia should capture a and b into the closure struct, and we should be able to extract them and pass them as arguments to the spawned thunk (so that they're treated correctly).

jpsamaroo avatar Oct 19 '22 01:10 jpsamaroo

I think part of this will require using Adapt.jl to search for thunks in the closure and in regular arguments (because we don't know at parse time whether a variable is captured from the surrounding scope). While definitely not trivial to implement, it would be a marked improvement in behavior (as users often expect that, for example, an array of thunks will be automatically unwrapped).

jpsamaroo avatar Oct 19 '22 01:10 jpsamaroo