Enable external brgemm API on aarch64
This change is an initial implementation of the external brgemm API for aarch64, mostly directly based on the existing x64 code. For now, only supports f32 is supported, with support for int8 and bf16 (and further improvements) to come in later patches building on this baseline implementation.
To test this change, enable DNNL_EXPERIMENTAL_UKERNEL before compiling, then run benchdnn with --brgemm as usual. As an alternative to benchdnn, you can use examples/ukernels/cpu_brgemm.cpp, though it has to be changed slightly to use f32 instead of int8 and not set AB scales.
Also, can we make sure this is tested in CI somehow?
Also, can we make sure this is tested in CI somehow?
As it is right now, building with DNNL_EXPERIMENTAL_UKERNEL is going to use the external brgemm api for ./benchdnn --brgemm, so it would probably need to be a separate, new pipeline (if we want to keep testing the internal brgemm api). Alternatively, we could maybe not replace internal brgemm benchdnn with the ukernel version on aarch64, and just write a suite of tests similar to examples/ukernels/cpu_brgemm.cpp for the CI. Either way, it's not gonna be as easy as just enabling the ukernel in the CI.
I'd lean towards just enabling DNNL_EXPERIMENTAL_UKERNEL where we can. My expectation is that when you call benchdnn --something you are directly testing external things. Testing the internal brgemm interface feels like a fallback rather than an intentional thing. The internal brgemm will continue to be tested via the external API, and more directly where it is actually used e.g. brgemm_matmal and brgconv etc.
Now that the API is enabled, I expected the example examples/ukernels/cpu_brgemm.cpp to be functional. Still, when running it after enabling -DDNNL_EXPERIMENTAL_UKERNEL=ON, it returns the error Kernel is not supported on this platform after querying brgemm::get_B_pack_type. Is this behaviour expected?
Now that the API is enabled, I expected the example
examples/ukernels/cpu_brgemm.cppto be functional. Still, when running it after enabling-DDNNL_EXPERIMENTAL_UKERNEL=ON, it returns the errorKernel is not supported on this platformafter queryingbrgemm::get_B_pack_type. Is this behaviour expected?
Yea, that's expected for now. This PR only enables f32 path for the moment, with bf16 and int8 to follow later. The cpu_brgemm.cpp example is running an int8 brgemm, so that particular kernel is not supported. It can quite easily be modified to run fine you change dt to f32 and also remove setting of A/B scales (I think binary post-op also has to be disabled too, that's an unrelated limitation of brgemm on aarch64).
@dzarukin we need a review from the onednn-arch team, could you have a look please? Thanks!