TagBoxArray::collate() runs into scaling issues when running on GPUs
TagBoxArray::collate() can be a non-trivial cost in GPU builds at scale, since the number of tagged boxes may be a non-trivial fraction of the domain, but we're doing this calculation serially.
Yup. This is a known issue. Did especially badly on KNL cores.
We are thinking about ways to improve regridding at scale ... one obvious way would be to split the domain into octants, for example, and regrid each separately.
Would Castro and other AMReX-based codes be open to this type of regrididng? It won't give identical patches as the current algorithm but is a quick and easy fix that seems pretty benign.
Thoughts?
On Sun, Jun 7, 2020 at 4:20 PM Marc Day [email protected] wrote:
Yup. This is a known issue. Did especially badly on KNL cores.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AMReX-Codes/amrex/issues/985#issuecomment-640294868, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6YVKTPL7TADW5AQIOSTRVQOC5ANCNFSM4NXXFMDQ .
-- Ann Almgren Senior Scientist; CCSE Group Lead
It couldn’t hurt but might be better if splitting planes could be problem dependent.
It looks straightforward-ish to do this operation on the GPU. Not too dissimilar from operations we already do for particles. I could give it a try this week.
Yeah, I think we should just get the operation happening on the GPU.