oneDNN
oneDNN copied to clipboard
cpu: aarch64: add arbitrary ACL post ops
Description
Adds class acl_post_ops_t
which enables Compute Library for the Arm® Architecture (ACL) based primitives to have an arbitrary number and type of post ops by composing acl_binary_t
and acl_eltwise_fwd_t
.
This class has been added to:
-
acl_inner_product_fwd_t
-
acl_matmul_t
- Convolution:
acl_gemm_convolution_fwd_t
,acl_wino_convolution_fwd_t
andacl_indirect_gemm_convolution_fwd_t
This replaces functionality in these classes which previously supported up to sum+eltwise with a more generalised approach. The primitives still fuse eltwise operations where possible, as before.
This patch also modifies the acl_eltwise_fwd_t
primitive to support any number of dimensions of input/output tensor.
These changes are covered by existing tests for eltwise and the base primitives (which include post-ops).
The overhead of the post ops is small, so the performance improvement for the newly supported cases is similar to the performance improvement of each base primitive over the reference implementations. Furthermore, acl_binary_t
and acl_eltwise_fwd_t
are several times faster than the reference implementation of binary and eltwise post ops.
Checklist
General
- [X] Do all unit and benchdnn tests (
make test
andmake test_benchdnn_*
) pass locally for each commit? - [X] Have you formatted the code using clang-format?
Performance improvements
- [X] Have you submitted performance data that demonstrates performance improvements?
As far as I can tell, the pipeline failures are the same as on master
As far as I can tell, the pipeline failures are the same as on master
Yes, I created an issue for our team, thanks for the information.
Thank you for the review @igorsafo
Please don't merge this yet, we are investigating a possible regression.
#1404 address one of the performance regressions. I will update this PR shortly to build on that one.
The performance regressions have been addressed, I'm happy for this PR to be merged
@jondea @nSircombe The changes landed into master and were backported into rls-v2.7. Thanks for the contribution!
Fantastic! Thanks @igorsafo.
Thanks @igorsafo!