Edward Shogulin

Results 21 issues of Edward Shogulin

### Details: - *[DOC] Quantization Scheme*

category: docs

### Details: - *[ARM] [INT8] FullyConnected* ### Tickets: - *ticket-id*

category: IE Tests
category: GPU
category: CPU
category: build

### Context [JIT Emitters](https://github.com/openvinotoolkit/openvino/blob/42f1cb095143f19c0b9ee25836c29748bc8d9bf2/src/plugins/intel_cpu/src/emitters/README.md) are part of code generation feature (a.k.a. tensor compiler) that automatically produces highly-efficient optimized fused subgraph binary code. Each emitter implements specific operation from low level...

good first issue
category: CPU
no_stale

### Context [JIT Emitters](https://github.com/openvinotoolkit/openvino/blob/42f1cb095143f19c0b9ee25836c29748bc8d9bf2/src/plugins/intel_cpu/src/emitters/README.md) are part of code generation feature (a.k.a. tensor compiler) that automatically produces highly-efficient optimized fused subgraph binary code. Each emitter implements specific operation from low level...

good first issue
category: CPU
platform: arm
no_stale

### Context [JIT Emitters](https://github.com/openvinotoolkit/openvino/blob/42f1cb095143f19c0b9ee25836c29748bc8d9bf2/src/plugins/intel_cpu/src/emitters/README.md) are part of code generation feature (a.k.a. tensor compiler) that automatically produces highly-efficient optimized fused subgraph binary code. Each emitter implements specific operation from low level...

good first issue
category: CPU
platform: arm
no_stale

Issues: 1. Default selected low precision kernel is not optimal for described below platform. 2. We have only 30% performance gain for low precision kernel VS fp16 in multithreaded mode....

Question

In accordance with documentation [NEGEMMLowpMatrixMultiplyCore](https://arm-software.github.io/ComputeLibrary/v24.06/classarm__compute_1_1_n_e_g_e_m_m_lowp_matrix_multiply_core.xhtml) suports only limited combinations of `QSYMM8` and `QASYMM8_SIGNED` precisions on inputs: src0 | src1 | src2 | dst -- | -- | -- | --...

Help wanted
Feature Request

Model: ```mermaid graph TD; Input1["Input src1: fp32"] Quantise1["NEQuantizationLayer q_src1: QASYMM8_SIGNED"] Input2["Input src2: fp32"] Quantise2["NEQuantizationLayer q_src2: QASYMM8_SIGNED"] MatMul["NEGEMMLowpMatrixMultiplyCore q_res: S8"] Input1-->Quantise1; Input2-->Quantise2; Quantise1-->MatMul; Quantise2-->MatMul; MatMul-->Result; ``` Can you confirm that `NEGEMMLowpMatrixMultiplyCore`...

Help wanted

Hi guys, I'm extremelly interested to speed up int8 `MatMul` inference with ARM Compute Library kernel. My model is: ```mermaid graph TD; Input1["Input out: fp32"] Quantise1["NEQuantizationLayer out: signed int8"] Input2["Input...

Help wanted

### Details: - *item1* - *...* ### Tickets: - *ticket-id*

category: CPU
platform: arm
do_not_review
do_not_merge