MNN
MNN copied to clipboard
opt(RVV): Optimize top1 functions with intrinsics
Summary
Optimize MNNVectorTop1Float and MNNVectorTop1Int32 using RVV intrinsics.
Environment
- Platform: Banana PI BPI-F3
- OS: EulixOS 3.0
Benchmark
Click to expand full test logs
[root@EulixOS ~]# ./test_vector_top1_float
inputCountUnit=1 (Total Elements=4)
Scalar time: 0.0000 sec
RVV time : 0.0000 sec
Speedup : 0.05x
Test inputCountUnit=1: PASSED
inputCountUnit=16 (Total Elements=64)
Scalar time: 0.0000 sec
RVV time : 0.0000 sec
Speedup : 2.00x
Test inputCountUnit=16: PASSED
inputCountUnit=256 (Total Elements=1024)
Scalar time: 0.0000 sec
RVV time : 0.0000 sec
Speedup : 4.95x
Test inputCountUnit=256: PASSED
inputCountUnit=1024 (Total Elements=4096)
Scalar time: 0.0001 sec
RVV time : 0.0000 sec
Speedup : 6.32x
Test inputCountUnit=1024: PASSED
inputCountUnit=1023 (Total Elements=4092)
Scalar time: 0.0001 sec
RVV time : 0.0000 sec
Speedup : 5.68x
Test inputCountUnit=1023: PASSED
inputCountUnit=100000 (Total Elements=400000)
Scalar time: 0.0094 sec
RVV time : 0.0013 sec
Speedup : 7.44x
Test inputCountUnit=100000: PASSED
inputCountUnit=1000000 (Total Elements=4000000)
Scalar time: 0.0940 sec
RVV time : 0.0128 sec
Speedup : 7.34x
Test inputCountUnit=1000000: PASSED
inputCountUnit=10000000 (Total Elements=40000000)
Scalar time: 0.9406 sec
RVV time : 0.1278 sec
Speedup : 7.36x
Test inputCountUnit=10000000: PASSED
All tests PASSED
[root@EulixOS ~]# ./test_vector_top1_int32
inputCountUnit=1 (Total Elements=4)
Scalar time: 0.0000 sec
RVV time : 0.0000 sec
Speedup : 0.00x
Test inputCountUnit=1: PASSED
inputCountUnit=16 (Total Elements=64)
Scalar time: 0.0000 sec
RVV time : 0.0000 sec
Speedup : 0.00x
Test inputCountUnit=16: PASSED
inputCountUnit=256 (Total Elements=1024)
Scalar time: 0.0000 sec
RVV time : 0.0000 sec
Speedup : 6.38x
Test inputCountUnit=256: PASSED
inputCountUnit=1024 (Total Elements=4096)
Scalar time: 0.0001 sec
RVV time : 0.0000 sec
Speedup : 6.06x
Test inputCountUnit=1024: PASSED
inputCountUnit=1023 (Total Elements=4092)
Scalar time: 0.0001 sec
RVV time : 0.0000 sec
Speedup : 6.54x
Test inputCountUnit=1023: PASSED
inputCountUnit=100000 (Total Elements=400000)
Scalar time: 0.0076 sec
RVV time : 0.0012 sec
Speedup : 6.25x
Test inputCountUnit=100000: PASSED
inputCountUnit=1000000 (Total Elements=4000000)
Scalar time: 0.0770 sec
RVV time : 0.0120 sec
Speedup : 6.39x
Test inputCountUnit=1000000: PASSED
inputCountUnit=10000000 (Total Elements=40000000)
Scalar time: 0.7694 sec
RVV time : 0.1210 sec
Speedup : 6.36x
Test inputCountUnit=10000000: PASSED
All tests PASSED
</details>