Distributed.jl
Distributed.jl copied to clipboard
`using` loads modules on workers but does not put exported bindings in Main
This may be intended, but it seems a bit awkward to me:
$ julia -p 1
[...]
| | |_| | | | (_| | | Version 0.4.0-dev+1922 (2014-12-02 23:10 UTC)
_/ |\__'_|_|_|\__'_| | Commit e4e1688* (0 days old master)
|__/ | x86_64-apple-darwin14.0.0
julia> using StatsBase
julia> @everywhere assert(isa(StatsBase.sample, Function))
julia> @everywhere assert(isa(sample, Function))
exception on 2: ERROR: sample not defined
in eval at /usr/local/julia/base/sysimg.jl:7
in anonymous at multi.jl:1395
in anonymous at multi.jl:820
in run_work_thunk at multi.jl:593
in run_work_thunk at multi.jl:602
in anonymous at task.jl:6
Obviously the workaround is @everywhere using StatsBase, but I think that using X should probably be equivalent to @everywhere using X, or if not, it should act exclusively on the main process. Instead it seems we get using X on the main process and require("X") on the workers.
I'm going to label this as a bug since it seems clear that this couldn't have been the intended semantics.
I've been bitten by this loaded-but-still-out-scope bug several times. And no one I've pointed this out to has any idea why this is the way this is.
I don't think it's intentional, I just think that no one has yet decided that they care enough to fix it. You could be the one!
@timholy Any idea where the definition of using is located? It's kind of hard to grep for.
IIUC require(::Symbol)
I think we need to fix this in the 0.5 timeframe itself. @everywhere using Mod or @everywhere require are problematic in themselves and responsible for https://github.com/JuliaLang/julia/issues/12381 and probably https://github.com/JuliaLang/julia/issues/16788
Which Julia/C function is called with using?
Base.require(:JSON) is equivalent of import JSON on all nodes.
It would be a bit annoying to go another release without fixing it. I once (long ago) spent a couple tens of minutes on this, but didn't get far enough to figure out how to do it or even to fully trace how using actually works.
@vtjnash Where is using implemented? It does a bit more than Base.require(Mod), just unable to trace it.
https://github.com/JuliaLang/julia/blob/59b253031af87f62e7d70a7d8848cdfd4a84288b/src/toplevel.c#L450-L464
I think I can fix this pretty easily, but there seems to be some debate going on over the appropriate fix to make this consistent. It seems clear that, wherever using/import causes modules to be required, they should also alter the bindings. The question is whether using/import should require the module only on the worker they're being called on (and thus @everywhere using X would always be necessary to load modules if they will be used on workers), or whether using/import should both require the module and import the bindings on all the workers. Opinions?
+1 for require the module and importing the bindings on all the workers in terms of consistency.
However, a different debate is to whether have using / import load on all workers or only on the calling process. Specifically w.r.t. plotting / visualization packages which are irrelevant on the workers. In which case we should require an explicit @everywhere using whenever we want to load a package everywhere.
My issue with the current behavior is two-fold:
- Requiring
using ModuleNameprior to@everywhere using ModuleNameseems weird and unintuitive (see https://github.com/JuliaLang/julia/issues/16189) @everywhere using ModuleNamewouldn't be a problem if it threw a specific error pointing the user to the proper solution. Something like: "ModuleNameis not loaded on workerX. Consider running@everywhere using ModuleNameto load it on all workers."
Specifically w.r.t. plotting / visualization packages which are irrelevant on the workers.
Gadfly and company are really heavy and would be slow to load on all workers. I agree that an explicit @everywhere using call would be better due to the performance implications.
We'd have to see how the two versions work in practice with actual parallel usage, but I'd lean towards making using and import local-by-default unless annotated with @everywhere.
It looks like local-by-default is most people's preferred option. My main concern is https://github.com/JuliaLang/julia/issues/3680, which would mean that, in many situations, all your workers would die if you forget the @everywhere. Not a great experience, especially for people trying to do stuff in parallel for the first time.
JuliaLang/julia#3680 can be partially worked around by
- having errors during deserialization send back a specific error, say
DeserializationErrorbefore closing the connection. - The other end prints the error as a warning and simply reconnects.
- This leads to a loss of messages already serialized, but is not an issue in the situation where folks forget the
@everywherebefore retrying again.
The other existing issue is a probable race condition with precompilation happening in parallel with @everywhere using
I'l submit a PR for the JuliaLang/julia#3680 workaround.
Bump. JuliaLang/julia#3680 is closed. Implement local-by-default loading?
Yes let's try it.
I'm curious - what's the status on this?
This bug tripped me in 0.7.0-beta and "using X; @everywhere using X" solves it, where I didn't need to use that trick on 0.6.3 (and all the previous 0.6's). Weird..