libCEED
libCEED copied to clipboard
Opt Backend Assembly
The /cpu/self/opt/* backends should implement their own version of diagonal/full assembly that assembles by element. A lot of the pieces are all there in the code, but spread out.
Current:
Assemble QFunction
for (elem in l-vec) Assemble Operator element
New:
for (elem in l-vec) {
Assemble QFunction element
Assemble Operator element
}
This is very similar to our approach with the operator application, except we would probably want to keep the block size set a 1 for simplicity. Then we can set /cpu/self/opt/serial as the operator fallback for /cpu/self/opt/blocked.
This would hopefully significantly decrease the assembly memory footprint (and speed things up) for the Opt, AVX, and XSMM backends.