amrex
amrex copied to clipboard
Tune PermutationForDeposition for MI250X
Summary
PermutationForDeposition was initially developed for A100. A few tweaks can be made to improve performance on MI250X, which has a smaller cache but is much less sensitive to atomic add congestion.
TODO: more tests and code comments
Additional background
Checklist
The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX users
- [ ] include documentation in the code and/or rst files, if appropriate