ginkgo icon indicating copy to clipboard operation
ginkgo copied to clipboard

Scaling example

Open fritzgoebel opened this issue 3 years ago • 5 comments

This PR adds an example enabling users to perform simple strong and weak scaling studies for the distributed SpMV. The example can be run with 2D (5-pt or 9-pt) and 3D (7-pt or 27-pt) stencil matrices. All matrix values are 1 for now as the result values don't really matter for this example.

fritzgoebel avatar Sep 30 '21 11:09 fritzgoebel

format!

fritzgoebel avatar Sep 30 '21 11:09 fritzgoebel

That's a really useful addition!

I have a few quick comments/questions, not only to you but also to people who know more about the MPI backend

  • That only works on a single node right, since you directly pass the rank to the executor?
    • With MPI, isn't there a mode where you use CUDA_VISIBLE_DEVICES or similar variables so that each MPI process only sees one executor, which would always be number 0, but different between every MPI process?
    • Should OpenMP be disabled (until we can bind threads) or do we need to mention somewhere the user needs to take care of that through MPI/OpenMP variables or to only put one rank per machine.
  • What safety additions are required so that exceptions don't get thrown all the time, device allocation, probably some MPI interaction calls would benefit from try/catch, or is that managed at a lower level?

tcojean avatar Sep 30 '21 11:09 tcojean

@fritzgoebel, I think it might make sense to add this to an updated version of this branch, mpi-base-dist-mat, which also runs CI for MPI ?

@tcojean ,

  1. Yes, I think you can have a local_rank getter, which should be sufficient for 1MPI rank to 1GPU association. See for example: https://github.com/ginkgo-project/ginkgo/blob/mpi-base-dist-mat/examples/distributed-solver/distributed-solver.cpp#L95

  2. Disabling OpenMP is a good idea and I think we should do that until we fix/update the thread binding.

  3. I guess you mean each rank throwing an exception ? MPI calls are wrapped in GKO_ASSERT_NO_MPI_ERRORS to those should be captured and thrown properly.

pratikvn avatar Sep 30 '21 12:09 pratikvn

  1. Yes, I think you can have a local_rank getter, which should be sufficient for 1MPI rank to 1GPU association. See for example: https://github.com/ginkgo-project/ginkgo/blob/mpi-base-dist-mat/examples/distributed-solver/distributed-solver.cpp#L95

Neat, using this would fix the issue in this example then

  1. Disabling OpenMP is a good idea and I think we should do that until we fix/update the thread binding.

It's maybe also fine as long as people are told to only use one process per machine, but it's indeed not going to be as powerful as people could expect it to be (only that mode should be able to work for now I think).

  1. I guess you mean each rank throwing an exception ? MPI calls are wrapped in GKO_ASSERT_NO_MPI_ERRORS to those should be captured and thrown properly.

Yes, they are thrown but then that means most Ginkgo API calls need to be surrounded by the user in try/catch blocks so that the exceptions are caught properly by the main function instead of thrown into the wild? https://github.com/ginkgo-project/ginkgo/blob/mpi-base-dist-mat/include/ginkgo/core/base/exception_helpers.hpp#L335

We have the same issue with memory allocations for example

Quick mention, it looks like in some places the check are missing: https://github.com/ginkgo-project/ginkgo/blob/mpi-base-dist-mat/mpi/base/bindings.hpp#L87

tcojean avatar Sep 30 '21 13:09 tcojean

format!

fritzgoebel avatar Sep 30 '21 13:09 fritzgoebel

Superseded by #1204

MarcelKoch avatar Jul 07 '23 07:07 MarcelKoch