MPIClusterManagers.jl icon indicating copy to clipboard operation
MPIClusterManagers.jl copied to clipboard

How to delete a manager?

Open einzigsue opened this issue 7 years ago • 0 comments

Hi All,

I just installed MPI.jl and find I cannot shutdown the MPIManager in a secured way. Is there a way to gracefully shut down a MPIManager in MPI_ON_WORKERS mode?

When I do

julia > using MPI
julia > manager = MPIManager(np=4)
julia > workers = addprocs(manager)
julia > @parallel (+) for i in 1:4 rand(Bool) end
julia > exit()

I receive error message as following.

WARNING: Forcibly interrupting busy workers INFO: INFO: INFO: pid=6516 id=3 op=interrupt pid=6516 id=4 op=interrupt pid=6516 id=5 op=interrupt CompositeException(Any[CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)]), CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)]), CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)])])

I then tried to call rmprocs(workers,waitfor=60.0) before exit() but it returns error message in Julia v0.6.0, saying "ERROR: UndefVarError: set_worker_state not defined".

If I call MPI.Finalize() before exit() in the head, it terminated julia and returns to the error message "

*** The MPI_Finalize() function was called before MPI_INIT was invoked. *** This is disallowed by the MPI standard. *** Your MPI job will now abort.

I then tried to call MPI.Finalize() in each worker

@everywhere using MPI
 for w in workers
         @spawnat w MPI.Finalize()
    end

before exit(), I receive the error message in julia like the following.

From worker 2: [(null):21884] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! From worker 3: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked. From worker 3: *** This is disallowed by the MPI standard. From worker 3: *** Your MPI job will now abort. From worker 3: [(null):21886] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! From worker 5: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked. From worker 5: *** This is disallowed by the MPI standard. From worker 5: *** Your MPI job will now abort. From worker 5: [(null):21891] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! From worker 4: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked. From worker 4: *** This is disallowed by the MPI standard. From worker 4: *** Your MPI job will now abort. From worker 4: [(null):21888] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! INFO: pid=21626 id=2 op=deregister INFO: INFO: INFO: pid=21626 id=3 op=deregister pid=21626 id=4 op=deregister pid=21626 id=5 op=deregister Worker 3 terminated. Worker 4 terminated.ERROR (unhandled task failure): EOFError: read end of file

Worker 5 terminated.ERROR (unhandled task failure): EOFError: read end of file

ERROR (unhandled task failure): EOFError: read end of file

How is the MPIManager in mode MPI_ON_WORKERS expected to be closed?

Cheers Yue

einzigsue avatar Sep 10 '17 06:09 einzigsue