quda icon indicating copy to clipboard operation
quda copied to clipboard

More coverage/clean up for split grid

Open hummingtree opened this issue 4 years ago • 1 comments

Improvements to split grid in the future:

  • Add support for split grid + multi-shift. It should be straight forward.
  • Add support for split grid when the number of sub-partitions is not equal to the number sources.
  • Refactor the command line option groups and combine all communication related options into a comms group.
  • Use smart pointers in callMultiSrcQuda, for ex input_clover.
  • Address the points @weinbe2 raised in the comments in #1107.
  • Make split grid compatible with NVSHMEM.

hummingtree avatar Dec 10 '20 20:12 hummingtree

Thoughts motivated by #1107 :

  1. Why does QMP's comm_rank_global -> QMP_get_node_number work to always pick the global rank zero, as opposed to comm_rank -> QMP_comm_get_node_number? I'd assume QMP_get_node_number would grab the current communicator as opposed to the default. Have you looked under the hood to make sure it'll always behave as expected? (And/or should I?)

  2. I think there's a potential risk in the MPI backend: if user_set_comm_handle in the constructor is set to true, the "default" MPI world won't be MPI_COMM_WORLD, which runs afoul of the MPI backend's comm_rank_global baking in MPI_COMM_WORLD. A simple solution could be caching the default MPI world on the MPI backend Communicator constructor, similar to how we're saving some default state with the QMP backend. (This may also be helpful for the QMP backend re:my concern above.) This could be a consideration that existed even before the original split grid PR.

If anything, these questions may beg a stricter definition of the Communicator API.

weinbe2 avatar Feb 09 '21 18:02 weinbe2