oshmpi
oshmpi copied to clipboard
Strided: Analyze overhead of strided datatype creation and decoding
Description: Analyze overheads of strided datatype creation and decoding in the strided RMA path. Need to configure with --disable-strided-cache to disable the datatype cache optimization in OSHMPI.
Starting point:
- Look into
OSHMPI_create_strided_dtypefunction in OSHMPI. - Can use
tests/iput.cas test program.
Hints: The datatype created in OSHMPI is always a resized vector with blocklength=1.
Evaluation platform: LCRC/Bebop Broadwell and KNL are preferred
Estimated effort: 3days - 1 week
TODOs:
- [ ] Overhead breakdown of strided datatype creation in OSHMPI/MPICH/Yaksa path on CPU
- [ ] Overhead breakdown of strided datatype creation in OSHMPI/MPICH/dataloop path on CPU (optional)
- [ ] Overhead analysis of strided datatype decoding in PUT in OSHMPI/MPICH path (assume yaksa and dataloop are the same) on CPU
- [ ] Overhead breakdown of strided datatype creation in OSHMPI/MPICH/Yaksa path on KNL
- [ ] Overhead breakdown of strided datatype creation in OSHMPI/MPICH/dataloop path on KNL (optional)
- [ ] Overhead analysis of strided datatype decoding in PUT in OSHMPI/MPICH path (assume yaksa and dataloop are the same) on KNL