amrex icon indicating copy to clipboard operation
amrex copied to clipboard

Tune PermutationForDeposition for MI250X

Open AlexanderSinn opened this issue 1 year ago • 0 comments

Summary

PermutationForDeposition was initially developed for A100. A few tweaks can be made to improve performance on MI250X, which has a smaller cache but is much less sensitive to atomic add congestion.

TODO: more tests and code comments

Additional background

sp4d_amd

Checklist

The proposed changes:

  • [ ] fix a bug or incorrect behavior in AMReX
  • [ ] add new capabilities to AMReX
  • [ ] changes answers in the test suite to more than roundoff level
  • [ ] are likely to significantly affect the results of downstream AMReX users
  • [ ] include documentation in the code and/or rst files, if appropriate

AlexanderSinn avatar May 07 '24 15:05 AlexanderSinn