cpu: aarch64: modify acl_pooling for stateless functions
Description
Make pooling use stateless pooling op from ACL.
Checklist
General
- [x] Do all unit and benchdnn tests (
make testandmake test_benchdnn_*) pass locally for each commit? - [x] Have you formatted the code using clang-format?
Performance improvements
- [x] Have you submitted performance data that demonstrates performance improvements?
./tests/benchdnn/benchdnn --pool --mode=P --dt=f16:f16 --tag=axb --alg=avg_np ic64ih56oh56kh3ph1
stateful:
OMP_NUM_THREADS=4
total perf: min(ms):0.301025 avg(ms):0.318544
OMP_NUM_THREADS=16
total perf: min(ms):0.121338 avg(ms):0.131665
stateless:
OMP_NUM_THREADS=4
total perf: min(ms):0.304443 avg(ms):0.320418
OMP_NUM_THREADS=16
total perf: min(ms):0.123047 avg(ms):0.133939
Thanks for the patch @zhili03. Could you please resolve the formatting failures?
Also nit: we usually prefix the aarch64 changes' commit messages with cpu: aarch64 rather than just cpu:. This makes it easier to grep for in the git logs as well.
Hi @Sqvid , thanks for it, will fix the format issue and commit messages in my next commit
Hi Zhibo, could you add performance numbers and the corresponding benchdnn command? Thank you!
Also, we need to remove the change id from the PR message and commit message thanks!
Hi Ryo, sure, will do in the next commit as well, thanks!
Hi @Sqvid @Ryo-not-rio some CI pipelines are failing because the stateless pooling function haven't been merge to ACL yet, does that mean this PR cannot be merged at least until next ACL's release? Thanks
@zhili03, @Sqvid, is this PR still relevant?
@vpirogov Yes, still in review
For the most recent CI failure, that failed test case passed on my local develop machine,
./tests/benchdnn/benchdnn --pool --tag=axb --alg=avg_np ic5ih15oh14kh3ph1n"pool_ci_2d:only_pad_l" 0:PASSED (11 ms) __REPRO: --pool --tag=axb --alg=avg_np ic5ih15oh14kh3ph1npool_ci_2d:only_pad_l tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0 total: 0.01s; create_pd: 0.00s (3%); create_prim: 0.00s (1%); fill: 0.01s (45%); execute: 0.00s (4%); compute_ref: 0.00s (4%); compare: 0.00s (7%);. Could reviewers @Sqvid @Ryo-not-rio @almayne help look into this? Thanks
For the most recent CI failure, that failed test case passed on my local develop machine
I believe the one you posted was the last successful test. Since you hit an assert() the failing REPRO line was not hit.
Here is a failing case:
./build/tests/benchdnn/benchdnn --pool --dt=f16:f16 --tag=axb --alg=avg_np ic64ih56oh56kh3ph1
The failing assert() is here:
https://github.com/uxlfoundation/oneDNN/blob/2945509677ef27f475d9cd8d26059af8317161dc/src/common/memory_tracking.hpp#L399
You will need to compile with asserts enabled. i.e, with CMAKE_BUILD_TYPE={Debug|RelWithAssert}.
LGTM, thanks!
Thanks for your help :)
@dzarukin Would appreciate a review from @uxlfoundation/onednn-arch, thanks!