ginkgo
ginkgo copied to clipboard
Scaling example
This PR adds an example enabling users to perform simple strong and weak scaling studies for the distributed SpMV. The example can be run with 2D (5-pt or 9-pt) and 3D (7-pt or 27-pt) stencil matrices. All matrix values are 1 for now as the result values don't really matter for this example.
format!
That's a really useful addition!
I have a few quick comments/questions, not only to you but also to people who know more about the MPI backend
- That only works on a single node right, since you directly pass the rank to the executor?
- With MPI, isn't there a mode where you use
CUDA_VISIBLE_DEVICES
or similar variables so that each MPI process only sees one executor, which would always be number 0, but different between every MPI process? - Should OpenMP be disabled (until we can bind threads) or do we need to mention somewhere the user needs to take care of that through MPI/OpenMP variables or to only put one rank per machine.
- With MPI, isn't there a mode where you use
- What safety additions are required so that exceptions don't get thrown all the time, device allocation, probably some MPI interaction calls would benefit from
try/catch
, or is that managed at a lower level?
@fritzgoebel, I think it might make sense to add this to an updated version of this branch, mpi-base-dist-mat
, which also runs CI for MPI ?
@tcojean ,
-
Yes, I think you can have a local_rank getter, which should be sufficient for 1MPI rank to 1GPU association. See for example: https://github.com/ginkgo-project/ginkgo/blob/mpi-base-dist-mat/examples/distributed-solver/distributed-solver.cpp#L95
-
Disabling OpenMP is a good idea and I think we should do that until we fix/update the thread binding.
-
I guess you mean each rank throwing an exception ? MPI calls are wrapped in
GKO_ASSERT_NO_MPI_ERRORS
to those should be captured and thrown properly.
- Yes, I think you can have a local_rank getter, which should be sufficient for 1MPI rank to 1GPU association. See for example: https://github.com/ginkgo-project/ginkgo/blob/mpi-base-dist-mat/examples/distributed-solver/distributed-solver.cpp#L95
Neat, using this would fix the issue in this example then
- Disabling OpenMP is a good idea and I think we should do that until we fix/update the thread binding.
It's maybe also fine as long as people are told to only use one process per machine, but it's indeed not going to be as powerful as people could expect it to be (only that mode should be able to work for now I think).
- I guess you mean each rank throwing an exception ? MPI calls are wrapped in
GKO_ASSERT_NO_MPI_ERRORS
to those should be captured and thrown properly.
Yes, they are thrown but then that means most Ginkgo API calls need to be surrounded by the user in try/catch
blocks so that the exceptions are caught properly by the main function instead of thrown into the wild? https://github.com/ginkgo-project/ginkgo/blob/mpi-base-dist-mat/include/ginkgo/core/base/exception_helpers.hpp#L335
We have the same issue with memory allocations for example
Quick mention, it looks like in some places the check are missing: https://github.com/ginkgo-project/ginkgo/blob/mpi-base-dist-mat/mpi/base/bindings.hpp#L87
format!
Superseded by #1204