MPI.jl icon indicating copy to clipboard operation
MPI.jl copied to clipboard

Prefix Julia error output with rank

Open sloede opened this issue 4 years ago • 16 comments

Currently, if you are running a Julia/MPI program in parallel and something bad happens, you get a lot of ERROR: LoadError: LoadError: UndefVarError: ... messages, which are all horrible interleaved. This in itself is a known "user issue" with MPI and (probably) cannot be fixed in an efficient manner. However, it would already help a lot that when running Julia/MPI programs, the error messages include the global rank such that a user has at least a fighting chance in finding out which rank died first. E.g, something like ERROR (rank 2): LoadError: LoadError: UndefVarError: ...

I don't know if this is even possible (injecting information in the Julia runtime output) without changes to upstream Julia, but it would IMHO be a great help to many scientists.

sloede avatar Mar 10 '20 14:03 sloede

I don't think there is a way to do this across different MPI implementations.

  • For Open MPI you can use --tag-output: https://www.open-mpi.org/doc/current/man1/mpiexec.1.php
  • For MPICH you can set the MPIEXEC_PREFIX_DEFAULT environment variable: https://www.mpich.org/static/docs/latest/www1/mpiexec.html

Alternatively, you could write to a file with MPI I/O using a shared file pointer (MPI_FILE_WRITE_SHARED), though we don't expose this function yet (PRs welcome!)

simonbyrne avatar Mar 10 '20 17:03 simonbyrne

For what it's worth, I was unable to get MPIEXEC_PREFIX_DEFAULT to do anything with MPICH, but the -prepend-rank (which documented in the help screen but not the man page) does work.

simonbyrne avatar Mar 11 '20 03:03 simonbyrne

One option would be an interface such as Cprintf in Chapter 8 of the "Parallel Programming with MPI" book https://github.com/cyliustack/benchmark/blob/b91924d5dc842906ebf94d4b154d548d944a030f/mpi/ppmpi/chap08/cio.c

We could define an interface like

MPI.Cprint(comm, root) do io
   print(io, ...)
end

which would be collective over comm, and copy all the data to root, and print it from there.

simonbyrne avatar Jun 10 '20 17:06 simonbyrne

I also suggested what I think is a better solution to the MPI forum: https://github.com/mpi-forum/mpi-issues/issues/296

simonbyrne avatar Jun 10 '20 18:06 simonbyrne

I also suggested what I think is a better solution to the MPI forum: mpi-forum/mpi-issues#296

This sounds like a good suggestion. However, would we benefit from this for the error output of Julia itself? In that case, the Julia executable would have to be somehow "MPI-aware", wouldn't it?

sloede avatar Jun 10 '20 18:06 sloede

I'm not quite sure yet how it would work. One option would be to modify Base.stdout, but I don't think that is a good idea as it won't help with the interleaving issue.

Interestingly, I did try out using the shared file pointers with /dev/stdout: it works with Open MPI, but not MPICH.

simonbyrne avatar Jun 10 '20 18:06 simonbyrne

cf https://github.com/pmodels/mpich/issues/4632

simonbyrne avatar Jun 10 '20 19:06 simonbyrne

I am back at this issue again, since we started parallelizing Trixi.jl with MPI. It's really annoying that if there is a runtime issue that only occurs on a subset of all ranks (or even just one), there is no way to discern this from the error message - instead, you have to re-run again and this time add copious amounts of println that includes the MPI rank.

Do think it would be feasible to convince Julia main to add the option to specify a prefix that is added to all output lines? And would it even be possible to implement something like this in a sane way? I'm think about something like

julia --e 'using MPI; MPI.Init(); Base.error_prefix(string(MPI.Comm_rank(MPI.COMM_WORLD)) * ": ")' script.jl

that would turn

ERROR: MethodError: no method matching String(::Int64)
Closest candidates are:
  String(::String) at boot.jl:321
  String(::Array{UInt8,1}) at strings/string.jl:39
  String(::Base.CodeUnits{UInt8,String}) at strings/string.jl:77
  ...
Stacktrace:
 [1] top-level scope at REPL[3]:1

into

17: ERROR: MethodError: no method matching String(::Int64)
17: Closest candidates are:
17:   String(::String) at boot.jl:321
17:   String(::Array{UInt8,1}) at strings/string.jl:39
17:   String(::Base.CodeUnits{UInt8,String}) at strings/string.jl:77
17:   ...
17: Stacktrace:
17:  [1] top-level scope at REPL[3]:1

I don't know, as I'm writing this, I can already feel that this is not a very elegant solution, but neither can I come up with something better. It's just hat not being able to re-use the compile cache while developing a Julia package with MPI is already painful enough (compared to compiled languages), but adding the fact that there's no obvious way to connect "compiler errors" to the ranks on which they occur just makes this worse :-(

sloede avatar Sep 10 '20 13:09 sloede

If you are running under Slurm or another manager you can also direct the stderr to a file per rank, or create a wrapper script that add the slurm task ID as a prefix to the output.

The most reliable way is to use OpenMPI with --tag-output

Adding an option to Julia would be interesting, but very invasive... Especially if you are interested in errors and not just logged messages.

vchuravy avatar Sep 10 '20 14:09 vchuravy

@vchuravy Thanks a lot for these suggestions! As far as I can tell from the manual, with Slurm I can use, e.g.,

#SBATCH --error=errors-%j-%t.out

which redirects all errors to a file identified by the job id and the task id (= rank).

or create a wrapper script that add the slurm task ID as a prefix to the output.

How would I be able to achieve this?

The most reliable way is to use OpenMPI with --tag-output

This is very interesting indeed. ~~However, I have found this only for OpenMPI - do you know whether there is a similar option for MPICH (which seems to be the default for MPI.jl under Linux)?~~

EDIT: I just found it... for MPICH, the -l flag prefixes the rank to each output. Note that I literally mean to each output (and not each line of output, as it seems it prefixes the rank to each print statement (here's what print_timer() output looks like from the MPI root): image

sloede avatar Sep 11 '20 05:09 sloede

Yeah Simon mentioned that he had trouble with MPICH https://github.com/JuliaParallel/MPI.jl/issues/360#issuecomment-597426035

How would I be able to achieve this?

I don't have a ready made solution, but as an example:

➜  ~ julia -e "error()" 2>&1 | ts "[1]"
[1] ERROR: 
[1] Stacktrace:
[1]  [1] error() at ./error.jl:42
[1]  [2] top-level scope at none:1

which uses ts from moreutils for other ideas look here https://unix.stackexchange.com/questions/440426/how-to-prefix-any-output-in-a-bash-script

and then you can use something like:

cat > launch.sh << EoF_s
#! /bin/sh
exec \$* | ts "[\$PMI_RANK] "
EoF_s
chmod +x launch.sh

srun --mpi=pmi2 ./launch.sh julia

Where PMI_RANK would be the environment variable for the global rank. (Caveat I haven't tested the above)

vchuravy avatar Sep 11 '20 09:09 vchuravy

The main pain point for me is interleaving within a line, and these workarounds don't fix that issue. Can something be done about that? Eg line buffering?

antoine-levitt avatar Oct 31 '20 08:10 antoine-levitt

Not that I know of: unfortunately there are no APIs for controlling buffers (each MPI implementation handles the output combination differently).

simonbyrne avatar Oct 31 '20 21:10 simonbyrne

Can something be done about that?

Short of writing an I/O handler that controls all output to the terminal, no, I don't think so. Non-interleaving line output to the terminal means that there would have to be a central instance that controls the output, which means global serialization of this problem. Since this is in contrast to the core goals of MPI, I don't think this feature will ever be provided by the MPI libraries themselves.

You can do something like this on your own for output to files, using MPI I/O (I've done this for logging purposes before), but it becomes very slow soon (IIRC, with >100 cores the overhead is already significant). Otherwise I think you'll have to implement it yourself, I'm afraid :-/

sloede avatar Nov 01 '20 05:11 sloede

That is pretty annoying. The simplest solution (and probably the only one that makes sense for larger process counts) is to do all the printing on process 0. The problem with that is that external libraries (eg Optim) don't know about MPI. A pretty brutal solution to that is to use redirect_stdout() on processes >0. It's a hack but it works.

antoine-levitt avatar Nov 01 '20 09:11 antoine-levitt

Yes, all sufficiently large (ie, beyond toy size) MPI-parallel programs that I know of only print from the MPI root. That's no help though if you're debugging and/or experiencing run time errors,where you typically don't control I/O. The problem with external libraries is exactly the reason for me to create this issue (here Julia being the "external" library).

sloede avatar Nov 01 '20 09:11 sloede