libCEED icon indicating copy to clipboard operation
libCEED copied to clipboard

SVE Backend

Open jeremylt opened this issue 3 years ago • 9 comments

jeremylt avatar Nov 11 '21 14:11 jeremylt

This now compiles and passes the t3 tests on Ookami.

ToDo:

  • [ ] Performance comparisons
  • [x] Makefile flag fix (not sure how the AVX one even works)
  • [ ] Improved vectorization instructions?
  • [x] Add SVE backend to README
  • [x] Do we want manual unrolling for opt? The unrolling is pretty straightforward

jeremylt avatar Nov 12 '21 17:11 jeremylt

I remember from Ookami's talk that performance with GCC was the worst among all compilers. I did a brief experiment before losing access to Ookami. sve/blocked with armclang: DoFs/Sec in CG: 1.43289 (1.43289) million opt/blocked with gcc: DoFs/Sec in CG: 0.729637 (0.729637) million

I am not sure why these numbers are so small compared to what we had Friday but I think the poor performance with sve might partly have something to do with compiler, maybe?! I have to apply for an account to do more experiments though.

LeilaGhaffari avatar Nov 15 '21 17:11 LeilaGhaffari

Is this ready for review? Should we include it in v0.11?

jedbrown avatar Sep 06 '22 02:09 jedbrown

The two big todos are fixing the makefile magic and seeing if this actually does anything different than OPT in terms of performance.

jeremylt avatar Sep 06 '22 14:09 jeremylt

I noticed that libxsmm contains aarch64/SVE code and it's announced as supported for the next release.

jedbrown avatar Oct 27 '22 15:10 jedbrown

Do we have a place where we can measure performance? There is a machine at Sandia that you can access if you put in a Sarape request and AWS c7g also has SVE. JLSE also has a system that I could try requesting.

jedbrown avatar Oct 27 '22 15:10 jedbrown

Hi @jedbrown @jeremylt, if there is any way I can help here let me know. Happy to take the time to test on AWS's c6g/c7g for you if there's a specific case you're interested in, also against whatever libxsmm has for Arm optimizations as well.

sebastiangrimberg avatar Mar 03 '23 03:03 sebastiangrimberg

I haven't had time to close the loop on this one. The big two todos are fixing the makefile detection of SVE support and running some basic performance comparisons to the /CPU/self/opt backends. I'm not sure the best way to do the first but we have a script to run our PETSc examples for the second.

jeremylt avatar Mar 03 '23 03:03 jeremylt

Sounds good, no rush. I'd be very happy to run the PETSc benchmarks on whatever instance types you want whenever it is ready, if that would be helpful to you.

sebastiangrimberg avatar Mar 03 '23 04:03 sebastiangrimberg