oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

cpu: aarch64: add arbitrary ACL post ops

Open jondea opened this issue 2 years ago • 6 comments

Description

Adds class acl_post_ops_t which enables Compute Library for the Arm® Architecture (ACL) based primitives to have an arbitrary number and type of post ops by composing acl_binary_t and acl_eltwise_fwd_t. This class has been added to:

  • acl_inner_product_fwd_t
  • acl_matmul_t
  • Convolution: acl_gemm_convolution_fwd_t, acl_wino_convolution_fwd_t and acl_indirect_gemm_convolution_fwd_t

This replaces functionality in these classes which previously supported up to sum+eltwise with a more generalised approach. The primitives still fuse eltwise operations where possible, as before.

This patch also modifies the acl_eltwise_fwd_t primitive to support any number of dimensions of input/output tensor.

These changes are covered by existing tests for eltwise and the base primitives (which include post-ops).

The overhead of the post ops is small, so the performance improvement for the newly supported cases is similar to the performance improvement of each base primitive over the reference implementations. Furthermore, acl_binary_t and acl_eltwise_fwd_t are several times faster than the reference implementation of binary and eltwise post ops.

Checklist

General

  • [X] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • [X] Have you formatted the code using clang-format?

Performance improvements

  • [X] Have you submitted performance data that demonstrates performance improvements?

jondea avatar Mar 24 '22 10:03 jondea

As far as I can tell, the pipeline failures are the same as on master

jondea avatar May 12 '22 08:05 jondea

As far as I can tell, the pipeline failures are the same as on master

Yes, I created an issue for our team, thanks for the information.

igorsafo avatar May 12 '22 17:05 igorsafo

Thank you for the review @igorsafo

jondea avatar May 24 '22 14:05 jondea

Please don't merge this yet, we are investigating a possible regression.

jondea avatar Jun 16 '22 12:06 jondea

#1404 address one of the performance regressions. I will update this PR shortly to build on that one.

jondea avatar Jun 30 '22 12:06 jondea

The performance regressions have been addressed, I'm happy for this PR to be merged

jondea avatar Jul 07 '22 10:07 jondea

@jondea @nSircombe The changes landed into master and were backported into rls-v2.7. Thanks for the contribution!

igorsafo avatar Aug 26 '22 21:08 igorsafo

Fantastic! Thanks @igorsafo.

nSircombe avatar Aug 28 '22 21:08 nSircombe

Thanks @igorsafo!

jondea avatar Aug 30 '22 06:08 jondea