Terry Cojean
Terry Cojean
I tried to check everything that is "taken care of". Overall, what is left I think is: + CUDA Dense Hybrid conversions (hard to implement AFAIK) + OpenMP CSR Ell,...
Note: this was updated to the last status.
As a small comment or to begin a discussion, I think we might need to explain a bit better the Ginkgo `communicator` now. Because I think our wrapper deviates from...
> I don't think we need to impose anything on the device IDs. Most (?) gpus allow you to run multiple mpi processes on the same gpu, but of course...
I must say I like keeping pointers. The decorator looks fairly good and easy enough to put in place without too many changes, in addition the last answer with the...
Thanks for taking the time to write all of this, this is useful resources I will need more time to proceed it properly. I wanted to mention that this book...
With the previous MRs #141 and related a function `compute_norm2` was added for Dense matrices. We still miss a bunch of norms and this is only for Dense, but there...
You are right, I should have read more carefully that this was for the whole matrix as a single entity. Maybe there is some common ground (in terms of interface?)...
That's a really useful addition! I have a few quick comments/questions, not only to you but also to people who know more about the MPI backend + That only works...
> 1. Yes, I think you can have a local_rank getter, which should be sufficient for 1MPI rank to 1GPU association. See for example: https://github.com/ginkgo-project/ginkgo/blob/mpi-base-dist-mat/examples/distributed-solver/distributed-solver.cpp#L95 Neat, using this would fix...