Milos Puzovic comments

Results 8 comments of


                                            Milos Puzovic

remapper_test fails on AARCH64 with --config=mkl_aarch64_threadpool

@elfringham, thanks for your fix. > @milpuz01 Do these features still need to be disabled for the best performance with ACL? Yes, they are still needed. Unfortunately, we should have...

ACL operators need to be made stateless to avoid runtime initialization overhead

We are about to start the exploratory work for 24.05 to integrate two oneDNN primitives convolution and matrix multiplication to use existing non-public ACL API for using stateless object in...

[oneDNN] Quantized batch-matmul and fusions

Hi @mdfaijul, as @elfringham identified the failure comes in `Setup()` method in `MklMatMulPrimtive` class when you are trying to set `b_mem` in `context_` by reading weight description from `prim_desc` that...

Upgrade submodule oneDNN to v3.4

> @milpuz01, suppose you've also kicked off some tests for ARM platforms as per previous email communication. Could you share any test result if available? Hi @Guobing-Chen, we still do...

Upgrade submodule oneDNN to v3.4

@malfet @atalman On Arm side @murste01 is setting up the same workflow as used by Intel to validate performance of oneDNN. He will take this discussion with @Guobing-Chen and @Xia-Weiwen...

NEGEMMLowpMatrixMultiplyCore: performance issue int8 vs fp16

INT8 kernels are using MMLA instead of MLAs that FP32 and FP16 using and they work core/memory system much harder. As a result with MMLA we can get anywhere between...

NEGEMM needs `configure` calling for each `run`

Hi @alvoron, > We'll have a look at the code in OneDNN. Since oneDNN 3.6 we are using stateless operators from ACL so in implementation of oneDNN `matmul` primitive we...

ACL: INT8 3x3 slower than F32 on small-channel conv (Neoverse-V2, SVE2)

cc @morgolock