libCEED
libCEED copied to clipboard
SVE Backend
This now compiles and passes the t3 tests on Ookami.
ToDo:
- [ ] Performance comparisons
- [x] Makefile flag fix (not sure how the AVX one even works)
- [ ] Improved vectorization instructions?
- [x] Add SVE backend to README
- [x] Do we want manual unrolling for
opt
? The unrolling is pretty straightforward
I remember from Ookami's talk that performance with GCC was the worst among all compilers. I did a brief experiment before losing access to Ookami.
sve/blocked
with armclang
: DoFs/Sec in CG: 1.43289 (1.43289) million
opt/blocked
with gcc
: DoFs/Sec in CG: 0.729637 (0.729637) million
I am not sure why these numbers are so small compared to what we had Friday but I think the poor performance with sve might partly have something to do with compiler, maybe?! I have to apply for an account to do more experiments though.
Is this ready for review? Should we include it in v0.11?
The two big todos are fixing the makefile magic and seeing if this actually does anything different than OPT in terms of performance.
I noticed that libxsmm contains aarch64/SVE code and it's announced as supported for the next release.
Do we have a place where we can measure performance? There is a machine at Sandia that you can access if you put in a Sarape request and AWS c7g also has SVE. JLSE also has a system that I could try requesting.
Hi @jedbrown @jeremylt, if there is any way I can help here let me know. Happy to take the time to test on AWS's c6g/c7g for you if there's a specific case you're interested in, also against whatever libxsmm has for Arm optimizations as well.
I haven't had time to close the loop on this one. The big two todos are fixing the makefile detection of SVE support and running some basic performance comparisons to the /CPU/self/opt backends. I'm not sure the best way to do the first but we have a script to run our PETSc examples for the second.
Sounds good, no rush. I'd be very happy to run the PETSc benchmarks on whatever instance types you want whenever it is ready, if that would be helpful to you.