openvino
openvino copied to clipboard
[RISCV64][SHL] Added FC FP32 executor
Details:
- Reused FC RVV from SHL
- The PR to SHL dev branch with accuracy fix for FC f32: https://github.com/openvinotoolkit/shl/pull/3
Tickets:
- N/A
TODO:
- [x] Fix
execType: gemm_f32 - [x] Added wrapper for
csinn_tensorandcsinn_sessionto allocate these structures and deallocate them
Prerequisites:
- [x] https://github.com/openvinotoolkit/openvino/pull/23901
@EgorDuplensky Could you please review the PR?
Any plans regarding the tests? Is there any RISCV emulator or something?
I used the current common FC tests.
As for emulators,
git clone https://github.com/T-head-Semi/xuantie-gnu-toolchain.git
./configure --prefix=/opt/riscv
make linux build-qemu
/opt/riscv/bin/qemu-riscv64 -cpu c910v ./ov_cpu_func_tests
This PR will be closed in a week because of 2 weeks of no activity.
@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit https://github.com/openvinotoolkit/openvino/pull/23964/commits/12b9a9f2fa743b484cc10aec370176a8b1b8bc9c:
- Added SHL tests for FC
- Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in
GraphOptimizerbut I'm not sure that we need to do it for now 🤔
@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit 12b9a9f:
- Added SHL tests for FC
- Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in
GraphOptimizerbut I'm not sure that we need to do it for now 🤔
Just wondering, is there any weights packing actually happening underneath? Or this is just shl fc not supporting a transposed weights? Anyway, if shl fc weights are shape agnostic, we could just run i.e. some ref transpose in scope of the shl executor constructor.
@EgorDuplensky rebased on the latest master and also add the following changes to the latest commit 12b9a9f:
- Added SHL tests for FC
- Disabled SHL FC execution if weights are not transposed because I didn't find API for non transposed weights repacking in SHL. As an idea to disable this optimization in
GraphOptimizerbut I'm not sure that we need to do it for now 🤔Just wondering, is there any weights packing actually happening underneath? Or this is just shl fc not supporting a transposed weights? Anyway, if shl fc weights are shape agnostic, we could just run i.e. some ref transpose in scope of the shl executor constructor.
- SHL doesn't support transposed weights. I don't see any checks, transposing functions or even fields in
csinn_fc_params. So I think that the library supports only[N, K]weights and cannot transpose[K,N]to[N,K]itself. As an idea, we can use shl_rvv_transpose_fp32 (withbatch = 1) here instead of disablingFuseFCAndTransposeOnWeights. - SHL FC implementation makes repacking of weights (not transposed) before execution in initialization function. However, this function corrupts the original weights pointer (writes repacked weights to the same pointer). It leads to segfaults. To fix it, I moved this repacking to execution function. For sure, it's extra overheads on execution. And now (after review and a few months), I got how we can improve it: we can make a copy of the original weights, save them to weights cache (if possible) and pass them to initialization function for the repacking - we don't corrupt the original weights. After that we will execute FC with packed weights and no need to repack weights on each execution. However, I'd like to suggest to do it in the separate PR to not to delay the GSOC development and allow the student to merge their own changes to master branch.
cc @dmitry-gorokhov