devito
devito copied to clipboard
cross-loop blocking in staggered tti and *elastic propagators
This should improve runtime performance of TTI staggered.
The basic infrastructure is already in the codebase (used for example by the CIRE algorithm), but it currently doesn't support the case in which all of the involved loop nests write to user-provided data (in the typical CIRE algorithm use case, all but one loop nests write to DSE-generated temporary Array
s)
the description in the original message is now obsolete, but the (performance) issue is still there:
for x
for y
...
for x
for y
...
there could be reuse across these loops, and we're currently dropping it on the floor
some tiling technique could/should be used instead
closing as nonsensical in retrospect
elastic requires skewing tti-staggered works just like any other codes with cross-derivatives assigned to temps by CIRE