Description

Adds class acl_post_ops_t which enables Compute Library for the Arm® Architecture (ACL) based primitives to have an arbitrary number and type of post ops by composing acl_binary_t and acl_eltwise_fwd_t. This class has been added to:

acl_inner_product_fwd_t
acl_matmul_t
Convolution: acl_gemm_convolution_fwd_t, acl_wino_convolution_fwd_t and acl_indirect_gemm_convolution_fwd_t

This replaces functionality in these classes which previously supported up to sum+eltwise with a more generalised approach. The primitives still fuse eltwise operations where possible, as before.

This patch also modifies the acl_eltwise_fwd_t primitive to support any number of dimensions of input/output tensor.

These changes are covered by existing tests for eltwise and the base primitives (which include post-ops).

The overhead of the post ops is small, so the performance improvement for the newly supported cases is similar to the performance improvement of each base primitive over the reference implementations. Furthermore, acl_binary_t and acl_eltwise_fwd_t are several times faster than the reference implementation of binary and eltwise post ops.

Checklist

General

[X] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
[X] Have you formatted the code using clang-format?

Performance improvements

[X] Have you submitted performance data that demonstrates performance improvements?

Mar 24 '22 10:03 jondea

As far as I can tell, the pipeline failures are the same as on master

May 12 '22 08:05 jondea

As far as I can tell, the pipeline failures are the same as on master

Yes, I created an issue for our team, thanks for the information.

May 12 '22 17:05 igorsafo

Thank you for the review @igorsafo

May 24 '22 14:05 jondea

Please don't merge this yet, we are investigating a possible regression.

Jun 16 '22 12:06 jondea

#1404 address one of the performance regressions. I will update this PR shortly to build on that one.

Jun 30 '22 12:06 jondea

The performance regressions have been addressed, I'm happy for this PR to be merged

Jul 07 '22 10:07 jondea

@jondea @nSircombe The changes landed into master and were backported into rls-v2.7. Thanks for the contribution!

Aug 26 '22 21:08 igorsafo

Fantastic! Thanks @igorsafo.

Aug 28 '22 21:08 nSircombe

Thanks @igorsafo!

Aug 30 '22 06:08 jondea

oneDNN
oneDNN copied to clipboard

cpu: aarch64: add arbitrary ACL post ops

Description

Checklist

General

Performance improvements

oneDNN oneDNN copied to clipboard

cpu: aarch64: add arbitrary ACL post ops

Description

Checklist

General

Performance improvements

oneDNN
oneDNN copied to clipboard