MNN icon indicating copy to clipboard operation
MNN copied to clipboard

opt(RVV): Optimize max and min float functions with intrinsics

Open ihb2032 opened this issue 1 month ago • 0 comments

Summary

Optimize MNNMaxFloat and MNNMinFloat using RVV intrinsics.

Environment

  • Platform: sg2044
  • OS: EulixOS 3.0

Benchmark

Click to expand full test logs
[root@openeuler-riscv64 hebo]# ./test_max_float
inputCountUnit=4
Scalar time: 0.0000 sec
RVV time   : 0.0000 sec
Speedup    : 0.04x
Test inputCountUnit=4: PASSED
inputCountUnit=1
Scalar time: 0.0000 sec
RVV time   : 0.0000 sec
Speedup    : 0.00x
Test inputCountUnit=1: PASSED
inputCountUnit=3
Scalar time: 0.0000 sec
RVV time   : 0.0000 sec
Speedup    : 0.00x
Test inputCountUnit=3: PASSED
inputCountUnit=65536
Scalar time: 0.0086 sec
RVV time   : 0.0017 sec
Speedup    : 5.02x
Test inputCountUnit=65536: PASSED
inputCountUnit=1000000
Scalar time: 0.1321 sec
RVV time   : 0.0338 sec
Speedup    : 3.91x
Test inputCountUnit=1000000: PASSED
inputCountUnit=10000000
Scalar time: 1.3309 sec
RVV time   : 0.3729 sec
Speedup    : 3.57x
Test inputCountUnit=10000000: PASSED

All tests PASSED
[root@openeuler-riscv64 hebo]# ./test_min_float
inputCountUnit=4
Scalar time: 0.0000 sec
RVV time   : 0.0000 sec
Speedup    : 0.08x
Test inputCountUnit=4: PASSED
inputCountUnit=1
Scalar time: 0.0000 sec
RVV time   : 0.0000 sec
Speedup    : 0.00x
Test inputCountUnit=1: PASSED
inputCountUnit=3
Scalar time: 0.0000 sec
RVV time   : 0.0000 sec
Speedup    : 1.00x
Test inputCountUnit=3: PASSED
inputCountUnit=65536
Scalar time: 0.0105 sec
RVV time   : 0.0017 sec
Speedup    : 6.34x
Test inputCountUnit=65536: PASSED
inputCountUnit=1000000
Scalar time: 0.1587 sec
RVV time   : 0.0340 sec
Speedup    : 4.67x
Test inputCountUnit=1000000: PASSED
inputCountUnit=10000000
Scalar time: 1.5811 sec
RVV time   : 0.3776 sec
Speedup    : 4.19x
Test inputCountUnit=10000000: PASSED

All tests PASSED

</details>

ihb2032 avatar Dec 01 '25 01:12 ihb2032