oneDNN
oneDNN copied to clipboard
aarch64: brgemm: add jit impl. for sve_512
Description
This patch adds jit implementation of brgemm for AArch64 SVE512.
Please merge after applying the following pull requests. #1461 #1462
Checklist
General
- [x] Do all unit and benchdnn tests (
make test
andmake test_benchdnn_*
) pass locally for each commit? - [x] Have you formatted the code using clang-format?
Thank you for your comment.
I think you forgot to add the inner_product implementation to cpu_inner_product_list.cpp.
I have added the inner_product implementation to cpu_inner_product_list.cpp.
Also could you clarify why you are keeping the whole AMX logic from x64? One of the goodness of the brgemm approach is that it should allow you to "just" reimplement the [brgemm API ](https://github.com/oneapi-> src/oneDNN/blob/master/src/cpu/x64/brgemm/brgemm.hpp)in the best way possible for your arch, and then reuse the same driver across architecture. I think this would simplify the whole brgemm support on the ARM side.
I temporarily left the AMX implementation to eliminate build errors. Since the ARM side does not use the AMX implementation, I removed the AMX implementation.
Hi @Koji-Kurihara ,
There were recent changes to quantization support described in this RFC: 20220201-quantization-scaling
Please review the x86 version of brgemm and update aarch64 version to make it work for int8.
The main changes are:
- Now scaling relies on per argument scales instead of a single output scale
- To reuse brgemm in other primitives currently it still takes a single pointer to
scales
, but a primitive that uses brgemm should precompute that scale by multiplying src_scale and wei_scale. - dst scale is not yet supported for brgemm kernels
- To reuse brgemm in other primitives currently it still takes a single pointer to
- Bias is aplied after dequantization (after appling input scales) and doesn't need to be quantized by user
- Scaling is runtime only, so scale factors are passed at execution time as separate memory objects and they are not known at creation time.
Hi @Koji-Kurihara @kawakami-k , Do you plan to make any additional changes to this PR?
Hi @igorsafo , I'm sorry. I have not been able to work on this issue due to lack of time. We will work on this issue before the Branch cutoff in v3.1.
@Koji-Kurihara Thanks for the update!
Hi @igorsafo, I am developing brgemm to work with int8. However, it will take more time to complete. So, could you please merge the current source code implementing brgemm that works with f32? I will send out another pull request for brgemm that works with int8 as soon as I finish development.
Maybe I am missing something, but I don't see the changes mentioned in your comments (removing amx_tileconfigure headers, printf, ...). Did you push those changes?
Hi @mgouicem, Can you see bcd4f7d? This commit ID contains those changes.
Hi @mgouicem, Thank you for your comment. I updated the copyright header and removed redundant descriptions. a796e94 contains these changes.
One last thing: please fix formatting using clang-format.
Thank you for your comment. I fixed the formatting using clang-format in f4e74db.
I'm modifying SVE512 implementation to match the latest x64 implementation. Please wait a week.
Hello @igorsafo @mgouicem @densamoilov, I have raised new PR for the same brgemm jit implementation support. Link to PR: https://github.com/oneapi-src/oneDNN/pull/1818 Requesting you to please close this PR. Thanks
Link to the new version: https://github.com/oneapi-src/oneDNN/pull/1818