ComputeLibrary icon indicating copy to clipboard operation
ComputeLibrary copied to clipboard

NEGEMMLowpMatrixMultiplyCore: set_pretranspose_A & set_pretranspose_B support

Open eshoguli opened this issue 1 year ago • 1 comments

Model:

graph TD;
    Input1["Input
    src1: fp32"]
    Quantise1["NEQuantizationLayer
    q_src1: QASYMM8_SIGNED"]
    Input2["Input
    src2: fp32"]
    Quantise2["NEQuantizationLayer
    q_src2: QASYMM8_SIGNED"]
    MatMul["NEGEMMLowpMatrixMultiplyCore
    q_res: S8"]

    Input1-->Quantise1;
    Input2-->Quantise2;
    Quantise1-->MatMul;
    Quantise2-->MatMul;
    MatMul-->Result;

Can you confirm that NEGEMMLowpMatrixMultiplyCore doesn't support transposed matrix and fix validation in accordance with implementation (experiment 2 below), please?

Experiment 1 (reference): Without transpose of input tensors, everything works as expected:

size_t n = 1;
size_t c = 1;
// A matrix: a1 x a2
size_t a1 = 6;
size_t a2 = 3;
// B matrix: b1 x b2
size_t b1 = 3;
size_t b2 = 6;

// Allocate input tensors
src1.allocator()->init(TensorInfo(TensorShape(a1, a2, c, n), 1, DataType::F32));
src2.allocator()->init(TensorInfo(TensorShape(b1, b2, c, n), 1, DataType::F32));

// Allocate & fill matrices
...

// We now have the quantisation info and can configure the quantised tensors
q_src1.allocator()->init(TensorInfo(TensorShape(a1, a2, c, n), 1, DataType::QASYMM8_SIGNED, src1_qinfo));
q_src2.allocator()->init(TensorInfo(TensorShape(b1, b2, c, n), 1, DataType::QASYMM8_SIGNED, src2_qinfo));

// Configure low precision gemm and initialise result tensor
NEGEMMLowpMatrixMultiplyCore qgemm;
q_res.allocator()->init(TensorInfo(TensorShape(a2, b1, c, n), 1, DataType::S32));
qgemm.configure(&q_src1, &q_src2, nullptr, &q_res);

// Allocate all tensors & run
...

Experiment 2: input tensor dimensions were NOT updated but transpose of B matrix was added: gemm_info.set_pretranspose_B(true);. Expectation: fail, matrix dimensions are not correct. Result: works as reference - no validation, no fail, the same results as reference.

...
// Configure low precision gemm and initialise result tensor
NEGEMMLowpMatrixMultiplyCore qgemm;
q_res.allocator()->init(TensorInfo(TensorShape(a2, b1, c, n), 1, DataType::S32));
GEMMInfo gemm_info; // <= new line
gemm_info.set_pretranspose_B(true); // <= new line
qgemm.configure(&q_src1, &q_src2, nullptr, &q_res, gemm_info);
...

Experiment 3: Input tensor dimensions were updated and transpose of B matrix was added: gemm_info.set_pretranspose_B(true);. Expectation: works as reference, the same results as reference. Result: fail, error message: validation fail: terminating due to uncaught exception of type std::runtime_error: in validate src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp:351: The product AB is defined only if the number of columns in A is equal to the number of rows in B.

size_t n = 1;
size_t c = 1;
// A matrix: a1 x a2
size_t a1 = 6;
size_t a2 = 3;
// B matrix: b1 x b2
size_t b1 = 6; // <= updated here: previous value is 3
size_t b2 = 3; // <= updated here: previous value is 6
...
// Configure low precision gemm and initialise result tensor
NEGEMMLowpMatrixMultiplyCore qgemm;
q_res.allocator()->init(TensorInfo(TensorShape(a2, b1, c, n), 1, DataType::S32));
GEMMInfo gemm_info; // <= new line
gemm_info.set_pretranspose_B(true); // <= new line
qgemm.configure(&q_src1, &q_src2, nullptr, &q_res, gemm_info);
...

eshoguli avatar Jul 19 '24 12:07 eshoguli

Hi @eshoguli

This patch makes changes in the ::validate() method to return an error when any of the pretranspose options are set to true. The implementation of NEGEMMLowpMatrixMultiplyCore does not support pretranspose.

Hope this helps

morgolock avatar Oct 03 '24 12:10 morgolock