remove order dependency between "using" and "addprocs"
The following sequence resulted in the worker segfaulting and terminating.
- using PTools
- addprocs(1)
- call a method in PTools. The method creates an anonymous function (in PTools code) and does a remotecall_fetch on worker 2 for executing the said anonymous function.
- worker 2 terminates at a deserialize call.
Flipping the order of addprocs and using results in proper execution.
Suggest that we keep track of modules loaded in process 1, and load the same modules on all workers whenever we do an addprocs
This seems a bit too magical.
At least the worker should not terminate abruptly, it should give a clean error message.
The module PTools was not even required on the workers. The closure had some references - not obvious looking at the code - which caused the problem. It took me a while to figure it out.
The contra argument for auto-loading modules, of course, is that typically the visualization packages are only required on process id 1.
It should certainly not crash.
How about an additional keyword argument to addprocs called pkgsync
addprocs(n, pkgsync=true) # The default. All packages currently loaded on 1 are loaded on the worker
addprocs(n, pkgsync=false) # Nothing is loaded on the worker
addprocs(n, pkgsync=[pkgs...]) # Only specified packages are loaded
The packages loaded are the ones listed in Base.package_list I assume. We strip the path and filename extension and try to load these on the worker.
If you loaded scripts using include or packages not on the standard search path, they will not be sync'ed, you will need to do so explicitly on the new workers.
Julia users link regarding the same issue - https://groups.google.com/d/msg/julia-users/-Y1rc8gkrgo/r6w5f144BkMJ
Perhaps
using PTools
addprocs(1)
using PTools # currently this has no effect on the worker
should cause PTools to be loaded on the worker?
(Discovered while testing JuliaLang/julia#8745 with the Images test, CC @vtjnash.)
To fix the Images test, I added an @ everywhere before the import statement. With JuliaLang/julia#8745 changes to require, syncing modules during addprocs might be easier now.
I got a module reload when I fixed it that way; I just pushed a different fix (to Images master, not tagged). But the fix uses a PR against JuliaLang/julia#8745 I'll be submitting shortly.
@amitmurthy I think the level of magic in using/require here is either too low or too high.
- Too low?
If
usingautomatically loads modules on workers then either modules loaded beforeaddprocsshould propagate orusing, as in the example in https://github.com/JuliaLang/Distributed.jl/issues/17, should load the module on the workers. Right now, you'll have to@everywhere using Moduleif the module is already loaded on master but the@everywhere` solution ends up in a mess if the module hasn't already been loaded on master. - Too high?
If we removed all the
requiremagic where the code is also executed on workers then you could just always use@everywhere using Module. It would be quite simple to reason about.
Too high. It would be much clearer if just doing @everywhere using Module worked. Rather than automatically doing code loading from the head node, how to load code should be configurable and you can say @workers command_that_tells_workers_to_load_code_from_node_1(). Otherwise they would just load code locally like a normal Julia process does by default.
Related - https://github.com/JuliaLang/Distributed.jl/issues/20
Status of this?