Arm Kernels & Configurations
- [x] Arm64 (Cortex-A53 or higher, ThunderX2, etc.)
- [x] Arm32 (Cortex-A9, Cortex-A15, etc.)
- [x] Arm64+SVE (A64fx, Neoverse, etc.)
Comments
- flame/blis#344 way of determining Arm64 implementation is (partly) adopted here.
- ~I'm creating configs
armv7aandarmv8abecause configcortexa53is roughly the same ascortexa57in BLIS config (similarly:cortexa9is close tocortexa15). So is it good to now removecortex-a9andcortex-a15lines inconfigure.ac? Or should I rename currentarmv8atocortexa53to reproduce the current BLIS layout?~ See a2a42ce comments.
@xrq-phys Thanks for this. armv8a is fine as long as it is generic enough across those uarchs (e.g. do Cortex-A53 and ThunderX2 share blocking paramters?). The only confusion is with A64fx et al. since those are also technically armv8a.
BTW the way kernels work in TBLIS is slightly different than in BLIS: a separate config is needed for each uarch that requires distinct blocking parameters or other settings, but multiple configs can share kernels by simply including the proper prototypes and adding a conditional block in the Makefile.am. See the AMD configs for examples.
Thanks for the comments.
In fact Cortex-A53 and ThunderX2 shared the same block size, but I want to further add TBLIS_CONFIG_?_THREAD_RATIO and TBLIS_CONFIG_?R_MAX_THREAD lines for ThunderX2 so I made it separate.
BTW the way kernels work in TBLIS is slightly different than in BLIS: ...
I see! In fact I've already made both armv8a and armv7a work. Just unsure about threading correctness & Autoconf coding style...
Noticed that TBLIS requires block sizes to be compile-time constants (i.e. constexprs).
Currently instantiating configs with 2 VLs (256 and 512bits) since VL>512 is not seen on any public roadmaps at the moment. And for 128bits, GEMM kernels have no difference for SVE than NEON.
Noticed that TBLIS requires block sizes to be compile-time constants (i.e. constexprs).
Yes, although it would be possible to kludge runtime numbers in there. MR/NR do actually have to be compile-time constants as they are used as template parameters.
After fixing *beta == 0 case, all test now pass for both Armv8a and ArmSVE :D