openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[RISCV64][SHL] Added FC FP32 executor

Open a-sidorova opened this issue 1 year ago • 5 comments

Details:

  • Reused FC RVV from SHL
  • The PR to SHL dev branch with accuracy fix for FC f32: https://github.com/openvinotoolkit/shl/pull/3

Tickets:

  • N/A

TODO:

  • [x] Fix execType: gemm_f32
  • [x] Added wrapper for csinn_tensor and csinn_session to allocate these structures and deallocate them

Prerequisites:

  • [x] https://github.com/openvinotoolkit/openvino/pull/23901

a-sidorova avatar Apr 10 '24 16:04 a-sidorova

@EgorDuplensky Could you please review the PR?

dmitry-gorokhov avatar Apr 16 '24 08:04 dmitry-gorokhov

Any plans regarding the tests? Is there any RISCV emulator or something?

I used the current common FC tests.

As for emulators,

git clone https://github.com/T-head-Semi/xuantie-gnu-toolchain.git
 ./configure --prefix=/opt/riscv
make linux build-qemu

/opt/riscv/bin/qemu-riscv64 -cpu c910v ./ov_cpu_func_tests

a-sidorova avatar Jun 05 '24 04:06 a-sidorova

This PR will be closed in a week because of 2 weeks of no activity.

github-actions[bot] avatar Jun 20 '24 00:06 github-actions[bot]

@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit https://github.com/openvinotoolkit/openvino/pull/23964/commits/12b9a9f2fa743b484cc10aec370176a8b1b8bc9c:

  • Added SHL tests for FC
  • Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in GraphOptimizer but I'm not sure that we need to do it for now 🤔

a-sidorova avatar Jun 28 '24 07:06 a-sidorova

@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit 12b9a9f:

  • Added SHL tests for FC
  • Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in GraphOptimizer but I'm not sure that we need to do it for now 🤔

Just wondering, is there any weights packing actually happening underneath? Or this is just shl fc not supporting a transposed weights? Anyway, if shl fc weights are shape agnostic, we could just run i.e. some ref transpose in scope of the shl executor constructor.

EgorDuplensky avatar Jul 11 '24 10:07 EgorDuplensky

@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit 12b9a9f:

  • Added SHL tests for FC
  • Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in GraphOptimizer but I'm not sure that we need to do it for now 🤔

Just wondering, is there any weights packing actually happening underneath? Or this is just shl fc not supporting a transposed weights? Anyway, if shl fc weights are shape agnostic, we could just run i.e. some ref transpose in scope of the shl executor constructor.

  1. SHL doesn't support transposed weights. I don't see any checks, transposing functions or even fields in csinn_fc_params. So I think that the library supports only [N, K] weights and cannot transpose [K,N] to [N,K] itself. As an idea, we can use shl_rvv_transpose_fp32 (with batch = 1) here instead of disabling FuseFCAndTransposeOnWeights.
  2. SHL FC implementation makes repacking of weights (not transposed) before execution in initialization function. However, this function corrupts the original weights pointer (writes repacked weights to the same pointer). It leads to segfaults. To fix it, I moved this repacking to execution function. For sure, it's extra overheads on execution. And now (after review and a few months), I got how we can improve it: we can make a copy of the original weights, save them to weights cache (if possible) and pass them to initialization function for the repacking - we don't corrupt the original weights. After that we will execute FC with packed weights and no need to repack weights on each execution. However, I'd like to suggest to do it in the separate PR to not to delay the GSOC development and allow the student to merge their own changes to master branch.

cc @dmitry-gorokhov

a-sidorova avatar Jul 17 '24 07:07 a-sidorova