ROC_SHMEM
ROC_SHMEM copied to clipboard
gda ionic: wave and fence optimizations
Motivation
Increase the maximum message rate by using all enabled threads in the wave for polling completions.
Technical Details
Use all available threads for polling the cq to increase the maximum message rate. Even when posting a single wqe in the wave, use all available theads for polling the cq to reserve space in the sq.
Changes were needed in the rocshmem abstraction to avoid disabling gpu threads, like taking turns or using only the first thread in a wave or wavefront. To avoid breaking other gda implementations, reimplement turn-based or single thread strategy in post_wqe_rma_turn and post_wqe_rma_single.
Submission Checklist
- [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
- [x] Compile with gda_ionic, gda_bnxt, gda_mlx5
- [x] Verify functionality and perf improves with ionic
- [x] Verify functionality and perf does not regress with bnxt
- [x] Verify functionality and perf does not regress with mlx5