Distributed.jl icon indicating copy to clipboard operation
Distributed.jl copied to clipboard

remove order dependency between "using" and "addprocs"

Open amitmurthy opened this issue 12 years ago • 12 comments

The following sequence resulted in the worker segfaulting and terminating.

  • using PTools
  • addprocs(1)
  • call a method in PTools. The method creates an anonymous function (in PTools code) and does a remotecall_fetch on worker 2 for executing the said anonymous function.
  • worker 2 terminates at a deserialize call.

Flipping the order of addprocs and using results in proper execution.

Suggest that we keep track of modules loaded in process 1, and load the same modules on all workers whenever we do an addprocs

amitmurthy avatar Jul 10 '13 16:07 amitmurthy

This seems a bit too magical.

ViralBShah avatar Jul 10 '13 17:07 ViralBShah

At least the worker should not terminate abruptly, it should give a clean error message.

The module PTools was not even required on the workers. The closure had some references - not obvious looking at the code - which caused the problem. It took me a while to figure it out.

The contra argument for auto-loading modules, of course, is that typically the visualization packages are only required on process id 1.

amitmurthy avatar Jul 10 '13 17:07 amitmurthy

It should certainly not crash.

ViralBShah avatar Jul 10 '13 17:07 ViralBShah

How about an additional keyword argument to addprocs called pkgsync

addprocs(n, pkgsync=true)    # The default. All packages currently loaded on 1 are loaded on the worker
addprocs(n, pkgsync=false)    # Nothing is loaded on the worker
addprocs(n, pkgsync=[pkgs...])    # Only specified packages are loaded

The packages loaded are the ones listed in Base.package_list I assume. We strip the path and filename extension and try to load these on the worker.

If you loaded scripts using include or packages not on the standard search path, they will not be sync'ed, you will need to do so explicitly on the new workers.

amitmurthy avatar Feb 06 '14 06:02 amitmurthy

Julia users link regarding the same issue - https://groups.google.com/d/msg/julia-users/-Y1rc8gkrgo/r6w5f144BkMJ

amitmurthy avatar May 06 '14 02:05 amitmurthy

Perhaps

using PTools
addprocs(1)
using PTools   # currently this has no effect on the worker

should cause PTools to be loaded on the worker?

(Discovered while testing JuliaLang/julia#8745 with the Images test, CC @vtjnash.)

timholy avatar Jul 11 '15 14:07 timholy

To fix the Images test, I added an @ everywhere before the import statement. With JuliaLang/julia#8745 changes to require, syncing modules during addprocs might be easier now.

vtjnash avatar Jul 11 '15 15:07 vtjnash

I got a module reload when I fixed it that way; I just pushed a different fix (to Images master, not tagged). But the fix uses a PR against JuliaLang/julia#8745 I'll be submitting shortly.

timholy avatar Jul 11 '15 16:07 timholy

@amitmurthy I think the level of magic in using/require here is either too low or too high.

  • Too low? If using automatically loads modules on workers then either modules loaded before addprocs should propagate or using, as in the example in https://github.com/JuliaLang/Distributed.jl/issues/17, should load the module on the workers. Right now, you'll have to @everywhere using Moduleif the module is already loaded on master but the@everywhere` solution ends up in a mess if the module hasn't already been loaded on master.
  • Too high? If we removed all the require magic where the code is also executed on workers then you could just always use @everywhere using Module. It would be quite simple to reason about.

andreasnoack avatar May 26 '16 13:05 andreasnoack

Too high. It would be much clearer if just doing @everywhere using Module worked. Rather than automatically doing code loading from the head node, how to load code should be configurable and you can say @workers command_that_tells_workers_to_load_code_from_node_1(). Otherwise they would just load code locally like a normal Julia process does by default.

StefanKarpinski avatar May 26 '16 13:05 StefanKarpinski

Related - https://github.com/JuliaLang/Distributed.jl/issues/20

amitmurthy avatar May 27 '16 03:05 amitmurthy

Status of this?

Nosferican avatar Jun 03 '19 04:06 Nosferican